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ABSTRACT 


Empirical Orthogonal Function (EOF) analysis is used to 
describe the synoptic forcing features of selected northwestern 
Pacific Ocean tropical cyclones from 1967 to 1976. EOF analy- 
Sis is applied to the geopotential field at 850, 700 and 500mb 
on a 120 point grid with 5 degree latitude and longitude 
Spacing that 1s centered on the storm. The 120 EOF coeffi- 
Cients (for each level) are computed for a sample of 454 
cases in the history file. The coefficient vectors are trun- 
cated to the first 10 coefficients, based on the Monte Carlo 
selection criteria of Preisendorfer and Barnett. These coeffi- 
Clents describe about 85% of the variance in the fields. The 
synoptic forcing represented by the EOF coefficients 1s then 
used as a predictor in a regression analysis track forecast 
scheme, along with past storm movement and intensity during 
the past 36 hours. The EOF-based regression equations are 
verified over an independent sample of 50 storms, and the 
position errors compared to the official Joint Typhoon Warning 
Center (JTWC) forecast errors. The EOF-based regression equa- 
tions give, on the average, a 17% reduction in error when 
compared to the official forecast issued by JTWC. Over the 
independent sample, the 500mb equations performed better than 


the equations of the other two levels. 





TABLE OF CONTENTS 


it INTRODUCTION ------------------------------------ 16 

II. DATA ACQUISITION AND FIELD DEFINITION ----------- 22 

III. EMPIRICAL ORTHOGONAL FUNCTIONS ------------------ 32 

A. BACKGROUND ---------------------------------- 32 

B. MECHANICS OF THE EOF METHOD ----------------- 37 

C. SELECTING THE NUMBER OF EIGENVECTORS ----—-- 45 

D. ROTATION OF VECTORS ------------------------- 51 

IV. RESULTANT EMPIRICAL ORTHOGONAL FUNCTIONS -------- 55 

vy. REGRESSION ANALYSIS ----------------------------- 86 

VI. POTENTIAL FOR USE WITH INDEPENDENT DATA --------- 114 

VII. CONCLUSIONS AND FUTURE. APPLICATIONS ------------- 125 

APPENDIX A: 700 AND 850MB EIGENVECTORS ------—------- 129 
APPENDIX B: REGRESSION COEFFICIENTS FOR 700 

AND 850MB -------------------------------- 140 

APPENDIX C: MODIFIED REGRESSION EQUATION RESULTS ~----- 144 

LIST OF REFERENCES ------------------------------------ 146 

INITIAL DISTRIBUTION LIST ----------------------------- 149 





LIST OF TABLES 


The number of valid cases by prior JTWC warning 
positions and future JTWC best track position. 
Peete ce eel: Gecaten a Soe a eee 


Eigenvalues and cumulative percent explained 
variance (in parenthesis) for the normalized 
Pe eeen at Caen Ley ea = === =~ == == =~ — — 


Eigenvalues and standard deviations corres- 
ponding to the modes in Table 4-l as generated 
by the Monte Carlo method (see description 

In text) ----- er 


Test parameter for the asymptotic theory of 
eigenvalues is shown for various modes. See 
text for details ------ errr e rn nn rae re een ee 


The correlation coefficient of the recon- 
structed field, using the number of modes 
indicated with the actual field being 
reconstructed. See text for details ------------ 


Values for the first 10 orthogonal coeffi- 
Clients for the case of 27 August 1967. (See 
beme Get Gefat)s)  =——=-—-—-—.— —S-—==-—-=--—=—---=-- 


Pearson product moment (correlation) between 

the orthogonal coefficient associated with 

the given eigenvector and the zonal motion at 

12 hour increments. A positive correlation 
implies west forcing. Also included is the 
instantaneous motion anticipated from the 

form of the eigenvectors in Figs. 4-2 to 4-ll --- 


Similar to Table 4-6, except for meridional 
motion and positive correlation implies 
ee ee — —— 


The independent storms; their dates of 
occurrence, position and intensity, and 

their past warning and future best track 

HiStOLlry q- reer ne rr rrr er ere 


Potential predictors used to develop the 
regression equations. The first ten predictors 
are different for each of the three pressure 
ee —— 


ay 


82 














\ 
| 


Sample size and Re staitstic for each zonal 
and meridional regression equation by fore- 
cast time and atmospheric level -------<---------- a5 


Means and standard deviations of the predic- 
tands (in nautical miles) for the dependent 
sample. See text for details --------<<<-------- 96 


The regression coefficients for the 7 

meridional equations using 500mb EOF's. A 

value of .0 indicates the predictor was not 

selected in the stepwise selection procedure ---- 99 


The regression coefficients for the 7 zonal 

equations using 500mb EOF's. A value of 

.0 indicates the predictor was not selected 

in the stepwise selection procedure ------------- 100 


Mean and standard deviation forecast vector 
error (nautical miles) of 24, 48 and 72 hours 
for the set of 50 independent storms. ----------- 102 


Mean and standard deviation forecast vector 
error (nautical miles) of 24, 48 and 72 
hours for the set of 454 dependent storms. ------ LO 


Mean and standard deviation of forecast 

vector magnitude error (n. mi.) for the EOF 
regression scheme and the JTWC official 

forecast for the independent storms. Only 

those storms where both forecasts have valid 

eee rsmac Cempeased ==—<—<—{<<—<{———-—=————---——-— == 105 


The regression coefficients for the 7 

meridional equations using 700mb EOFP's. A 

value of .0 indicates the predictor was not 

selected in the stepwise selection 

ee ee = — — — = —— = = = — = = = = = === 140 


The regression coefficients for the 7 

zonal equations using 700mb EOF's. A value 

of .0 indicates the predictor was not 

selected in the stepwise selection 

PROCCAUUEC == 9-9-9 nn nn nn ee 141 


The regression coefficients for the 7 

meridional equations using 850mb EOF's. A 

value of .0 indicates the predictor was not 

selected in the stepwise selection procedure ---- 142 


—— > 


SS = = 








The regression coefficients for the 7 zonal 
equations using 850mb EOF's. A value of 

.0 indicates the predictor was not selected 

in the stepwise selection procedure ------------- 


Sample size and R? statistic for each zonal 
and meridional modified regression equation 
by forecast time and atmospheric level ---------- 








Best Cher (CURRS 


The moveable 120 point grid on which the D- 
values were extracted relative to the position 
of the storm. The storm is located at grid 

point 70, denoted byé. Distances in degrees 
latitude and longitude to the various grid 

points are shown. The grid point numbering 
system is demonstrated in the first two 

COLUMNS wr rrr ret rte tr err rer re rn rrr recess 


The mean (composite) D-value field at 500mb. 
Isopleths are deviation in meters from standard 
atmosphere. Storm is always located at grid 
point 70 (X) ----<<- 2 ene ener en nee ree eer ener 


The composite standard deviation D-value 

Field (in meters) at 500mb. The storm 

1s always located at grid point 

710 (XX). terre rrr rrr rr rere 


eae Oe eee 2 7 Cee pt LOL 700m -—-—---—--———— 
Similar to Fig. 2-3, except for 700mb ----------- 
Similar to Fig. 2-2, except for 850mb -~---------- 
Similar to Fig. 2-3, except for 850mb ----------- 


An example of trivariate principal components. 
See texte for details. (Merrisen, 1967) --------- 


The largest twenty true eigenvalues of the 

500mb D-value fields compared to the Monte 

Carlo generated eigenvalues for the same modes. 
Monte Carlo eigenvalues are denoted by a 
triangle, the true 500mb values by, a circle ----- 


Eilgenvector 1 elements (multiplied by 100) at 
500mb with the tropical cyclone located at 

the x-position ---------------------------------- 
Similar to Fig. 4-2 except for eigenvector 2 ---- 


Similar to Fig. 4-2 except for eigenvector 3 ---- 


Similar to Fig. 4-2 except for eigenvector 4 ---- 


28 


60 





simebber tO hig. 4-2 except for eigenvector 5 ---- 69 
Similar to Fig. 4-2 except for eigenvector 6 ---- 69 
Similar to Fig. 4-2 except for eigenvector 7 ---- 79 
Similar to Fig. 4-2 except for eigenvector 8 ---- 79 
Similar to Fig. 4-2 except for eigenvector 9 ---- J] 
Similar to Fig. 4-2 except for eigenvector 10 --- 97] 


500mb D-value (meters) field surrounding typhoon 
Marge at QOOOGMT 27 August 1967. Marge is 
located at 18°N 125°E (location X) 


Reconstruction of 500mb D-value field, OOOOGMT 
27 August 1967, using the first eigenvector and 
orthogonal coefficient. This compares to the 
true field (Fig. 4-12) 


except first three 
in reconstruction 


oumelar to Fig. 4-13, 
eigenvectors are used 


except first four 
in reconstruction 


cHpmisbar to Fig. 4-13, 
eigenvectors are used 


except first five 
in reconstruction 


Stmllars £0 Pig. 4-13, 
eigenvectors are used 


except first nine 
in reconstruction 


Similar to Fig. 4-13, 
eigenvectors are used 


except first ten 
in reconstruction 


Similar to Fig. 4-13, 
eigenvectors are used 


except first twenty 
in reconstruction 


Similar to Fig. 4-13, 
eigenvectors are used 


except first forty 
in reconstruction 


Sitar to Fig. 4-13, 
eigenvectors are used 


Storm displacement from base time position, in 
nautical miles for all storms with 500mb 
coefficient 1 less than negative 9. 12-hour 
movement is indicated by a cross 


Similar to Fig. 4-21, except these storms all 


have 500mb coefficient 1 greater than 
positive 9 


10 








Comparison of the forecast error for the inde- 
pendent data cases. Schemes compared are the 
500mb EOF regression scheme versus the JTWC 
official forecast, for a 24 hour forecast. 

Units are in nautical miles --------------------- 


Similar to Fig. 5-1, except the 700mb EOF 
regression forecast is compared to JTWC 
official forecast for a 24 hour forecast -------- 


Similar to Fig. 5-1, except the 850mb EOF 
regression forecast is compared to JTWC 
official forecast for a 24 hour forecast -------- 


Similar to Fig. 5-l, except the 500mb EOF 
regression forecast 1s compared to JTWC 
official forecast for a 48 hour forecast -------- 


Similar to Fig. 5-l, except the 700mb EOF 
regression forecast is compared to JTWC 
official forecast for a 48 hour forecast -------- 


Similar to Fig. 5-1, except the 850mb EOF 
regression forecast is compared to JTWC 
official forecast for a 48 hour forecast -------- 


Samilar to Fig. 5-1, except the 500mb EOF 
regression forecast is compared to JTWC 
official forecast for a 72 hour forecast -------- 


Similar to Fig. 5-1, except the 700mb EOF 
regression forecast 1S compared to JTWC 
Official forecast for a 72 hour forecast -------- 


Similar to Fig. 5-1, except the 850mb EOF 
regression forecast 1s compared to JTWC 
official forecast for a 72 hour forecast -------- 


Comparison of the JTWC official forecast over 

the independent data set, as well as the 

complete and homogeneous independent EOF 
regression set and the dependent set errors. 

All EOF results computed from 500mb equations --- 


Similar to Fig. 5-10, except EOF regression 
results obtained from 700mb equations ----------- 


Similar to Fig. 5-10, except EOF regression 
results obtained from 850mb equations ----------- 


Je: 


110 


112 





Al-1 


Comparison of coefficient 1 derived over 


dependent and independent samples. 


for details. On the figures, 


is the mean and the outer two lines the 95% 
confidence intervals (plus/minus two standard 


deviations). 
cases used 


Similar 
Similar 
Similar 


Similar 


EO Pig, 
COshiG. 
ile) jaale/ 


SO Fic. 


6-1 except 
6-1 except 
6-1 except 


6-1 except 


using absolute differences 


coefficients only 


Eligenvector 1 elements 
700mb with the tropical cyclone located at 
X-position 


Similar 
Similar 
Similar 
Similar 
Samana 
Similar 
Similar 
Similar 
Similar 
Samar Lar 
Similar 
> Lm lear 
Similar 
Similar 


Similar 


to Fie 
to Fi. 
jefe) BULGE 
(E(e) Mose 
tO Fue. 
eo) Ries 
jefe} | Sale; 
EQ, Fig. 
EOe BiG. 
tO F igy, 
Ee Filo. 
EG HG. 
EO Fag: 
nO Fig. 


EOmE TC: 


(mul 


Al-l except 


Al-l except 


Al-l except 


Al-l except 


Al-l except 


Al-l1 except 


Al-l except 


Al-l except 


Al-l1 except 


Al-l except 


Al-ll excep 


Al-ll excep 


Al-1ll excep 


Al-ll excep 


Al-1l1 excep 


de 


The x-axis 1s the number of 


for coefficient 2 
for coefficient 
for coefficient 4 
for coefficient 4 


of the derived 


tiplied by 100) 
for elgenvector 
for eigenvector 
for eigenvector 
for eigenvector 
for elgenvector 
for elgenvector 
for eigenvector 
for eigenvector 
for eigenvector 
for 850mb level 
t for eigenvector 
t for eigenvector 
t for eigenvector 
t for eigenvector 


t for elgenvector 


See text 
the middle line 


30 --—-= 


LS 
Wakes 
eg 


ILLS, 


22 


Loe 
130 
IE sie 
JS) 1h 
Ls 
ALG) 
ESS 
Ee 
134 
134 
Lgid 
BES es, 
36 
L3i6 
Le:/ 


ery 





7 
ai 8 
B19 


Al-20 


Similar 
Similar 
Similar 


Similar 


Co 


EO 


EO 


to 


Pe ie 
ALi 
EI Seal 


jd SAE 


EXGepe 
except 
except 


except 


13 


ror 
LOr 
for 


ime) 


eigenvector 7 -- 


elgenvector 
eigenvector 


elgenvector 


£5 8 


res 


Ine he, 


39 





ACKNOWLEDGEMENT 


I wish to thank Professor Russ Elsberry for patience and 
guidance through the course of the research. He has the 
greatest ability a teacher may have; that of guiding a stu- 
dent to reach for higher potential and learning. Addition- 
ally, I wish to thank Professor R. L. Haney for the valuable 
comments on the manuscript. The comments were instructive 
and very greatly appreciated. 

I would also like to thank Dr. Rudolph Priesendorfer, for 
taking time out of a very busy schedule to discuss aspects of 
the research during the formative stages. 

A hearty thanks to Major Dan Brown, USAF, who accomplished 
the initial research using this data, and in so doing encoun- 
tered and solved the data extraction problems. Additionally, 
many of the computer programs used during this research were 
Simply modifications. of programs written by Dan. His initial 
work greatly reduced the number of problems. 

Additionally, the staff of the W. R. Church computer center 
and in particular Kristina Butler and Jeaneanne Washington for 
all their aid in using the computer. Their aid made coping 
with the system much easier, and made time spent with the 
system much more profitable. 

To fellow students Robert Allen, Charles Hopkins and Barry 
Donovan and the rest for discussing and offering critiques of 


the research as time progressed; thank you one and all. 


14 





Finally, I would like to dedicate this work to the memory 


of my father, Leonard Shaffer, through whom all things became 


possible. 


le 





Le ENERO DUCTION 


Tropical storms spawned over the western North Pacific 
Ocean genesis region have great impact on both civilian and 
military populations; accurate movement forecasts are critical 
to reduce their impact upon these communities. The Joint 
Typhoon Warning Center (JTWC), Guam, Marianas Islands, issues 
the official forecast (to United States military agencies) 
of tropical storm movement and intensity for storms generated 
in this region. Using current forecast techniques, these 
official forecasts have an average forecast position error on 
the order of 120, 240 and 360 nautical miles for 24-, 48-, and 
72-hour forecasts (Annual Typhoon Report, JTWC, 1981). There 
1s potential for improvement. 

Present forecast techniques for tropical storm movement 
may be generally categorized as being either statistical (which 
includes analog techniques) or dynamical. The motivation 
driving the two types of forecasts differs greatly. Statisti- 
cal forecasts typically use regression or analog methods with 
all available historical storms having archived data to pro- 
duce a statistically optimal position forecast. Regression 
analysis methods assume that certain variables deterministically 
correlate with future storm displacement. These correlated 
variables are then used in a regression analysis to produce a 
forecast. Leftwich and Neumann (1977), for example, use a 


second order polynomial regression with seven primary predictors 
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to forecast typhoon movement. The seven predictors include 
Julian date, initial latitude and longitude, and past 1l2- 
and 24-hour zonal and meridional movement. Since they used 
polynomial regression, these seven primary predictors actually 
give rise to 35 predictors when the second order predictors 
are formed. Using these predictors, Leftwich and Neumann 
were able to account for 65% of the variation in the zonal 
displacement and 53% of the variation in the meridional dis- 
placement for 12 hours. Over a 72-hour period, the amount of 
explained variance became progressively smaller. Analog tech- 
niques (e.g., Jarrell and Sommervell, 1970), use the histori- 
cal file of storms to identify storms, and the surrounding 
environmental fields, that have strong similarities to the 
present storm. Then, a weighted similarity index of certain 
variables is used to select those storms in the history file 
that are most similar to the present storm. A weighted aver- 
age of the selected storm tracks is the basis of the forecast 
movement of the present storm. The justification for using 
this technique is that a storm with similar location and 
surrounding fields should also have a similar track. Jarrell 
and Sommervell (1970) present an analog scheme which is the 
Original version of the scheme used presently at JTWC. 

In contrast to the statistical methods, dynamic forecast 
techniques assume that the motion of the storm may be fore- 
Gaee Gprectly from numerical integration of geophysical 


governing equations (momentum, continuity and thermodynamic 
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equations, for example). aoiaadson (1973) presents a simple 
nested grid model to forecast typhoon movement using the primi- 
tive equations. This is the original version of the opera- 
tional nested tropical cyclone model available at JTWC 
(Harrison, 1981). 

Both statistical and dynamical forecast methods have weak- 
nesses. The statistical methods have two primary problems; 
first, since they are based on historical data cases, any 
storm that has an unusual motion is not likely to be forecast 
well. Additionally, the use of statistical methods tends to 
homogenize (smooth) the forecast. Forecasts using a blend of 
Similar past history storms are typically insensitive to 
subtle differences in the synoptic (dynamic) forcing fields. 
Thus, purely statistical methods have deficiencies in fore- 
casting the unusual case and inability to distinguish subtle 
differences in the synoptic-scale fields. 

Dynamic forecasts, on the other hands, have limitations 
in both theory and cost. Due to the smallness of the coriolis 
parameter in tropical regions, a geostrophic relationship is 
not feasible. This makes initialization of fields difficult 
and increases the probability that any eereneeue data points 
will deteriorate the numerical forecast rapidly. Convective 
heating is a driving mechanism for development of tropical 
storms, rather than baroclinic instability as in the mid- 
latitudes. Unfortunately, convective heating is very difficult 
to model (Haltiner and Williams, 1980). Therefore, the govern- 


ing equations are suspect in the tropics, due to poor 
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initialization and modeling of convective heating. An even 
greater problem is that interaction between different scales 
of motion is critical to maintain an energy balance in the 
tropical cyclone. If the grid spacing is not small enough, 
the energy balance will be altered, and possibly give spurious 
solutions. For this reason, the grid must have very fine 
resolution to simulate numerically this interaction. The cost 
of numerical integration on a fine grid can be very large due 
to the Courant~-Fredrichs-Levy (CFL) condition which requires 
smaller integration time steps as the grid spacing decreases 
(Haltiner and Williams, 1980). An additional problem with a 
fine grid model is that there are generally inadequate wind 
and mass observations to initialize the numerical model in the 
tropics, and this problem is increased as the grid size is 
reduced. 

With the difficulties in both types of forecasting methods, 
an alternative method is proposed here. This study will em-~ 
ploy Empirical Orthogonal Functions (EOF's) to represnet 
numerically the large scale synoptic (dynamic) fields. Then, 
these functions will be used to forecast the tropical storm 
movement using regression equations. This approach is novel 
for forecasting of tropical storm movement, in the sense that 
previous regression analysis methods (Leftwich and Neumann, 
1977, for example) have not incorporated the entire synoptic 
forcing field. If an attempt to develop a simple linear re- 


gression model using a large synoptic field is made, the number 
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of predictors becomes prohibitive, as each grid point value 
relative to the storm would be a predictor. Early analog studies 
used only a single feature from the synoptic chart, such as 
the 700mb trough longitude to the north of the storm, to repre- 
sent the synoptic field. This study will use the Empirical 
Orthogonal Function representation of the entire synoptic forcing 
field around the tropical storm. Therefore, in a broad sense, 
this approach may be thought of as a dynamically-based statis- 
tical forecast scheme. This type of approach is not totally 
without precedence. Lorenz (1977) states: 

In an informal conversation in which this writer 

(Lorenz) took part in about 20 years ago, the 

question arose as to how the best system for pro- 

ducing the operational objective 24 h prog could 

be developed, if the system had to be ready within 

one year. We more or less agreed that the further 

improvements in numerical weather prediction to be 

expected in a single year would be small, and that 

the greatest gains would come from an empirical 

scheme in which the numerically produced prognostic 

charts, or "numerical progs" would enter as 

PrEedsctors.... 
Substitution of "improved tropical forecast scheme" for "24 h 
prog” in the quotation gives the basis and purpose of this 
study. 

Empirical Orthogonal Function analysis allows a field with 
many grid points to be represented by a linear combination of 
a few constant vectors and variable coefficients, while fre- 
taining a large portion of the total variation (from the mean 


state) in the field. Thus, a synoptic field with many grid 


points may be accurately represented by only a few variable 
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coefficients (given the vectors are constant), which makes the 
technique ideal to use with regression analysis. For example, 
Kutzbach (1967) was able to represent 88% of the total varia- 
tion in average January temperatures at 23 stations (grid points) 
in North America over a 25-year period by using only five 
coefficients and constant vectors. That is, the entire synop- 
tic scale chart of mean temperature was represented by a 23 
element vector, and all of the data were stored in 25 indi- 
vidual 23-element vectors. Thus, Kutzback was able to reduce 
the number of vectors needed to describe the January tempera- 
ture field for each year (at the 23 locations) from 25 to 5. 
The Empirical Orthogonal Function analysis in this study 
is used for data reduction and representing synoptic fields 
numerically. The synoptic-scale forcing upon the tropical 
storm may be represented by only a few coefficients obtained 
from the analysis. These coefficients may be then used to 
forecast statistically the tropical storm movement. In this 
manner, the synoptic (dynamic) forcing is incorporated into 
the statistical forecasting scheme. Thus, the primary pur- 
pose of this study is to investigate the role of the synoptic 
forcing and to forecast tropical storm movement from this 


feecaing. 
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II. DATA ACQUISITION AND FIELD DEFINITION 


The tropical cyclone tracks and height data used in this 
study are identical to those used by Brown (1981). The data 
required for an individual case include D-value fields at 850, 
700 and 500mb and the storm location history prior to and 
after the forecast time. A relocatable 120-point grid is 
defined with 5-degree grid spacing in both longitude and lati- 
tude. The grid covers an areal extent of 70 degrees east to 
west and 35 degrees north to south. Individual grid points 
are numbered as shown in Fig. 2-1. The grid is moved each 
Map time such that the tropical storm is always located at 
grid point 70. A moveable grid can create difficulty in ob- 
taining composite variable fields due to the longitude con- 
vergence as the storm moves further north. For this study, 
this problem is assumed to be of minor importance, and any 
composite type fields are computed assuming a flat earth. It 
will be shown below that this assumption is not too bad over 
the domain used in this study. 

D-values are defined (Huschke, 1959) as height deviations 
(in meters) from the standard atmosphere height at a constant 
pressure surface, and are typically positive in the tropics. 
The source of the data is the operational Fleet Numerical 
Oceanography Center's (FNOC) Northern Hemisphere (63 X 63) 
analyses at 850, 700 and 500mb. The following selection condi- 


tions are required: 
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(1) A tropical cyclone of at least tropical storm (35 
knots) intensity must be present west of 180°W; 

(2) The storm must persist at least 30 hours with tropical 
storm intensity or greater, as analyzed by the Joint Typhoon 
Warning Center (JTWC), Guam; 

(3) The storm must be located between 10° and 25°N. This 
requirement was included to insure the grid did not extend 
into the Southern Hemisphere, and was not comprised of pri- 
marily mid-latitude D-values. Since the latitudinal domain 
is limited, the problem of longitude convergence is not a 
Significant problem at the latitudes of the domain. The dis- 
tance from the western edge of the grid to the storm ranges 
from 1772 nautical miles at 10°N to 1631 nautical miles at 
25°N, to 1474 nautical miles at 35°N and finally to 1157 
nautical miles at 50°N. This range of distance is considered 
imeeogniicicant. 

(4) Since the storm position is coupled with the upper 
level analysis, only storms existing at 0000 GMT and 1200 
GMT are considered; 

(5) A 36-hour separation between subsequent positions of 
the same storm 1s required to provide a iy Ws ~ 1d epandanee 
between cases. This independence is a critical considera- 
tion whenever statistical analysis is conducted. 

After defining the selection criteria (1) through (5), 
the JTWC Annual Typhoon reports from 1967 to 1976 were examined 


to select potential cases. These particular years were chosen 
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because the FNOC Northern Hemispheric D-value fields were 
available from Systems and Applied Sciences, Monterey, Cali- 
fornia, during these years. Examination of the JTWC reports 
yielded 560 potential cases meeting the criteria above. How- 
ever, only 540 cases had the required D-value data. Of these 
540, there were data problems with an additional 36 cases, 
leaving 504 valid cases. Archived D~-value data were inter- 
polated to the 120-point movable grid by the method of Bessel 
linear interpolation (Brown, 1981). The phrase "base time” 
will be used to define the time of the initial D-value field, 
and therefore the forecast. The storm warning position from 
JTWC is used as the location at the base time and at all times 
prior to the base time, whereas the JTWC best-track position 
is used for verification positions. This is a significant 
difference from Brown (1981), who used only the best-track 
positions for all historical locations. Warning positions 
are used because they are the actual locations available at the 
time of forecast. The best~-track positions are calculated 
after the typhoon season, and are not available to the fore-~ 
caster in the field. Nevertheless, they are assumed to be 
the optimal position and therefore the value that the forecast 
scheme tries to replicate. 

Storm warning positions are obtained at the base time and 
12, 24 and 36 hours prior to the base time. Best track posi- 
tions are gathered for future positions in 6-hour increments 
from the base time to 84 hours in the future. Therefore, a 


storm with complete history has continuously available locations 
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for 120 consecutive hours. The set of three levels of D-value 
fields, four warning positions and 15 best-track positions 
comprise the entire set for each case. The number of cases 
having X available prior warning positions and Y future best 
track locations available is shown in Table 2-1. It is inter- 
esting to note that while there are 504 valid cases meeting 
criteria (1) through (5), only 401 cases have all 36-hours of 
prior warning position. Furthermore, only 185 cases have both 
36-hours prior warning position and 84 hour future best track 
positions available. The number of storms with 36-hour prior 
warning position available increases to 298 available cases 
with 48-hour future best track location and 401 storms with 
30-hour future best track locations at tropical storm strength. 
The number of cases with a full 36-hour history is important 
when the regression equations are developed. 

The composite D-value fields at 500, 700 and 850mb using 
all 504 cases are shown in Figs. 2-2, 2-4 and 2-6. Of inter- 
est is the relatively small gradient in the tropics in the 
500mb composite. This level has relatively little indication 
of a tropical disturbance at grid point 70, since the 500mb 
level is near the level at which the surface cyclone becomes 
an upper-level anticyclone. The lower level (850 and 700mb) 
charts show fairly strong gradients in the D-value field around 
point 70. Figs. 2-3, 2-5 and 2-7 show the D-value standard 
deviations for all three levels. As expected, the greatest 
D-value variation is near the storm location and in the mid- 


latitude westerlies to the north. These mean and standard 
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WABLE 2 - 1 


The number of valid cases by prior JTWC warning positions 
and future JTWC best track position. See text for details. 


NUMBER OF CASES 


TOTAL ZTH BASE TIME AND 
12 HOUR 24 HOUR 6 HOUR 
PRIOR WARNING POSITIONS ONLY 
FUTURE 
LOCATIONS 
AVAILABLE 
(in hours) 
6 504 461 422 401 
Zz 504 461 G22 401 
18 504 461 422 4Q1 
24 504 46 1 422 401 
30 504 461 422 401 
36 480 439 400 379 
42 380 351 SH15 298 
48 380 3511 S15 298 
54 380 351 S15 298 
60 B15) 74 325 291 274 
66 265 242 215 200 
72 265 242 215 200 
78 265 242 215 200 
84 265 221 185 
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Fig. 2-2. The mean (composite) D-value field at 500mb. 
Isopleths are deviation in meters from 
standard atmosphere. Storm is always located 
ataceid poine 70" (xX). . 
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Fig. 2-3. The composite standard deviation D-value 
field (in meters) at 500mb. The storm is 
always located at grid point 70 (x). 
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Fig 2-4. Similar to Fig. 2-2, except for 700mb. 





Fig. 2-5. Similar to Fig. 2-3, exept for 700mb. 
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Fig. 2-7. Similar to Fig. 2-3, except for 850mb. 


30 





deviation fields are the fields used in normalizing the data 
for each case, by grid point, for use in the Empirical 
Orthogonal Function analysis. The 504 cases comprise the 


data set from which the Empirical Orthogonal Functions will 





be obtained. 
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III. EMPIRICAL ORTHOGONAL FUNCTIONS 


A. BACKGROUND 

fiewcerminology “Empirical Orthogonal Function” (EOF) was 
introduced by Lorenz (1956). Actually, EOF analysis is a 
variation of the statistical technique of principal com- 
ponents, and was introduced in its current form by Hotelling 
(1933), and was based on an idea of Pearson (1901). Before 
delving into the mechanics of EOF analysis, the basic concepts 
and meaning of principal components will be presented geo- 
metrically. Geometric meanings presented for principal 
components are valid for EOF's, since EOF's differ from 
principal components only by a scaling factor. 

Principal components aid in explaining interrelations of 
individual variables acting on a larger field. Morrison (1967) 
presents a concise geometric interpretation of the method. 
Principal components may be drawn from data sets in any num- 
ber of dimensions, but their meaning is most easily seen in 
three-dimensional space. Suppose three variables (X) 7X5 ,X3) 
form a trivariate observation space. For example, Xs Xo, and 
XK, could be the 500mb D-value at gridpoints 1, 2 and 3 respec- 
tively. A large collection of simultaneously measured values 
of the three variables could be plotted as in Fig. 3-l. The 
shaded ellipsoid in the figure represents the scatter plot of 
many observations of the three variables. The origin of the 


ax1s 1S the mean value for each of the three variables. The 
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Fig. 3-1. An example of trivariate principal 
components. See text for details 
(Morrison, 1967). 
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first of the three principal components (there will generally 
be three unique principal components in three dimensions) is 
the major axis of the ellipsoid, denoted as Yy in the figure. 
In other words, the first principal component is the axis in 
space that explains the maximum variation from the origin in 
the three-dimensional space. For this reason, the term 
principal axes is sometimes used instead of principal com- 
ponents. It is easily seen that this first principal component 
can be represented by a vector (and the vector 180 degrees out 
of phase) originating at the origin. The second principal 
component is the minor axis (Yo) which describes the maximum 
amount of variation in the ellipsoid that is not explained by 
the first component. The second principal component is also 
subject to the constraint that it be orthogonal to the first 
component. This is identical to saying the second principal 
component is the largest minor axis which is orthogonal to 

the major axis. The third principal component is the third 
minor axis (Y 4) which explains the remainder of the variation 
of the ellipsoid. This component is subject to the constraint 
that it be orthogonal to the first two components (axes). Thus 
the three principal components explain the total variation in 
the observation ellipsoid. The components are simply orthogonal 
axes, in three dimensions. It is seen from this simplified 
example that the technique may be easily extended to applica- 
tion in multiple dimensions. If the axes are defined by 


vectors, it is straightforward to find orthogonal vectors by 
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standard methods. This orthogonality constraint simplifies 
identification and interpretation. 

In M-dimension space, there will be M (or occasionally 
fewer) orthogonal components, which are simply the orthogonal 
vectors in M space. If there are fewer than M unique com- 
ponents, the observation variables are overdefined, and two 
or more of the describing variables are perfectly correlated. 
If this is the case, one of these perfectly correlated varia- 
bles may be omitted with no loss of information. 

As mentioned, Lorenz (1956) introduced the terminology 
"Empirical Orthogonal Function", and made the application to 
the atmospheric sciences. The mathematical method used for 
finding the orthogonal components or vectors involves solution 
of the eigenvalue problem in M space. EOF's are simply princi- 
pal components that have not been scaled by the square root 
of the corresponding eigenvalue found in the solution. This 
subtle difference is really of little concern. It does cause 
a slight modification in the computations, and also slightly 
changes the interpretation of the results. This interpretation 
difference arises because the loadings (elements) of the solu- 
tion eigenvector (principal component) are nothing more than 
the correlation of the variables in a given dimension with the 
principal axis it defines (Anderson, 1958). No such easy 
interpretation of the loadings is possible with EOF's. This 
modification is not significant, and the salient points and 
geometric interpretation valid for principal components are 


likewise valid in EOF analysis; only the lengths of the 
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orthogonal vectors are different. The mathematical details 
will be covered in the next section. 

EOF analysis normally has been used in two primary appli- 
cations in geophysical sciences. These are either as a map- 
typing tool, or as a tool for reducing dimensionality and 
explaining the variance structure of a large field. For 
example, Stidd (1967) uses EOF analysis to describe the varia- 
tion in average monthly rainfall in Nevada. In this paper, 
Stidd states: 

eigenvectors might be regarded as an ultimate develop- 

ment in the use of orthogonal functions to describe 

patterns or arrays of data. 
He goes on to show that annual precipitaion in Nevada may be 
described primarily by one of three basic "components". The 
three are: (1) a winter maximum from large scale storms; 
(2) a secondary peak during the summer due to thunderstorms; 
and (3) a small effect due to the removal and inclusion of 
water into the hydrological structure due to snow pack. EOF 
analysis allows extraction of each component and allows the 
researcher to determine the primary variables driving each of 
the components. Additionally, by using a linear combination 
of the eigenvectors (components), it is possible to determine 
and estimate the rainfall amount in data sparse and non-observed 
regions. This estimation is done by interpolation of coeffi- 
Cients associated with each eigenvector. These coefficients 
will be explained more fully in the next section. Stidd was 
able to explain 93% of the total variance in the annual rain- 


fall in Nevada by using only three eigenvectors and coefficients. 
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This is compared to the initial estimation which required 12 
charts (one for each month). The key points are that Stidd 
was able to both isolate the causes behind annual variation 
in Nevada rainfall (over all locations in Nevada), and addi- 
tionally, reduce the data required to make this estimate by 
75% (from 12 charts to three). This "gleaning of the forcing 
pattern” and data reduction use of EOF's has been used fre- 
quently in meteorological applications. Other examples of 
EOF use in this manner are found in Rinne and Karhila (1979), 
and Craddock and Flood (1969). 

Another application of EOF analysis has been for map typ- 
ing. Brown (1981) uses EOF analysis to divide a large sample 
of cases into smaller discrete subsets by map typing based on 
the coefficients derived from EOF analysis. The primary objec- 
tive was to use the subsets of similar cases to form analogue- 
type forecasts of tropical cyclone tracks. Accuracy of fore- 
casts using this map typing scheme is generally less than with 


other objective tropical cyclone motion forecasting techniques. 


B. MECHANICS OF THE EOF METHOD 

The mechanics of EOF analysis presented here follows an 
elegant treatment by Kutzbach (1967). The notation used in 
this development is defined as follows; a single underscored 
foeowe in lower case letters’ is a vector (€.g., e), an 
Uppercase variable with two underscores is a matrix (A), and 


a primed vector of matrix is the transpose (e'). The raw 


data field (in this study, the 120 grid point fields of 


3), 





D=-values) is formed into a matrix, A. This matrix is con- 
structed so that each column consists of the 120 observed 
D-values for a particular data case. Each row represents the 


D-values at the same grid point for all data cases. If there 


are N separate data cases (storms), with each case having M 





grid point values, A is an M X N matrix representing the 
observed D-value fields. The objective of EOF analysis is to 
Getermine the single vector (e) in M dimensions that best 
represents all of the N observation vectors. This is equiva- 
Meme CO Saying that one wants to find the vector (e) that 
minimizes the summed squared error of all observation vectors 
Gempared to (ce). Therefore, EOF analysis may be thought of 
broadly as a multi-dimensional extension of a least squares 
technique. 

The matrix A may be constructed in one of three ways: 
with the actual data values; with the departure from mean 
data values; or with the normalized departure from mean values. 
There are advantages and disadvantages to using each type of 
initialization for the data matrix A. In the first case, the 
resultant EOF's will have magnitudes on the order of the actual 
data, and will effectively represent the actual component 
field. Morrison (1967) points out that this type of input 
matrix may be dangerous to use if the variables in the differ- 
ent dimensions vary widely in magnitude. As seen in the mean 
and standard deviation charts of the fields (Figs. 2-2 through 
2-7), this could be a problem here, since the D-values are 


generally quite a bit lower in the northern portion of the grid, 
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as well as having larger variation in the north. There are 
systematic differences in magnitude at different points on 

the grid (dimensions). Thus, the grid points with larger 
values are given more weight than the grid points with smaller 
values, and some of the meaning of the resultant eigenvectors 
is lost. For this reason, this type of input data was not 
used. A second potential form for the data matrix A is tO 
have the elements be comprised of the deviations from the mean 
value of a given dimension (row). This type of approach is 
more in line with the classical principal components approach. 
In this case, the eigenvectors are extracted from the covari- 
ance matrix. This is really the main advantage to this form, 
while the primary disadvantages are that the interpretation 

of the resultant elgenvectors becomes muddled due to scaling 
of the dimensions and again, there is not equal weight between 
dimensions if their magnitudes differ. The third choice for 
the input data matrix form is to use normalized departures 
from the mean. This has a disadvantage in that it may smooth 
Slightly the resultant eigenvectors (Kutzbach, 1967). This 
approach was selected because the variations in all dimensions 
are equally weighted in extracting the eigenvectors. In this 
study, normalization is accomplished by subtracting the mean 
value at that grid point (over all cases), and then dividing 


by the standard deviation of that grid point over all cases; 
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where: 


(a__) is the transformed data point 
mn | 
= Sn is the original data point (D-value) 
ae is the mean of a at grid point m (taken 
over all n cases) 
Ss is the standard deviation of a at grid 
am ; 
point m (over all n cases). 


Brown (1981) discusses in more detail various methods of 


normalization transformations. 
Aater obtaining the normalized input data matrix A (over 


all N cases), the next step is to maximize the quantity 
2ee= 
(e'a)“N*/e'e , (1) 


(where, unless otherwise noted, any product of two vectors 


or matrices is the dot (inner) product) under the constraint 


that 
e'e = 1. (2) 


Equation (1) is the squared product of an arbitrary vector 
Semeena ene actual data vectors. Constraint (2) is made simply 
to normlaize the maximized product. This maximization of (1) 


with constraint (2) may be rewritten: 


Max{y: e'e = 1} where y = (e'A)” N-, (3) 
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or 


Max{y: e'e = 1} where y = e'AA' en? ’ (4) 
Defining R = AA' ee equation (4) may be written as 
Bey: G'S = 1} “where y = e'Re. (5) 


mms or imterest to note that the form of R is the cross 
Eueeuct matrix if A is comprised of the actual data. However, 
feeomeie COvariance matrix, or the correlation matrix, if the 
input matrix A has elements which are deviations from the 
mean or normalized deviations from the mean, respectively. 


Peemalctiplying both sides of equation (5) by e results in 


(6) 


| 
K< 
M 
|| 
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Morrison (1967) shows that maximization of y leads to the 
requirement that |R - yl| = 0, or else the solution is trivial. 
Maximization of (6), therefore, yields the eigenvalue problem, 
where y is the eigenvalue. 

Equation (6) applies to maximization of one eigenvector 
only. Since there are M dimensions in the original problen, 
one wishes to maximize the explained variance in each of the 
dimensions. Therefore, it 1S convenient to rewrite (6) for 


all vectors in the M~Space as 
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femme, & 1S an MX M matrix, rather than a vector as was the 
case for (6). It turns out that the elements of Y are the 
eigenvalues found solving |R - ¥YI| = 0. Each column of E 
is an eigenvector associated with a single eigenvalue v3. 
It follows from the definition of eigenvectors that they are 
orthogonal (uncorrelated). Again, the necessary condition in 
Eimding E is that E'E = I, the identity matrix. 

Pemurning to the basic definition of R, it is seen by 


substitution that 


(3) 
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Morrison (1967) has shown that the eigenvector associated 
with the largest eigenvalue (Y,) is the vector that explains 
ie mexamum Variation in R. In fact, the first elgenvector 


explains 


m 

a ee (9) 
1=1 

uae total variation in R. The variance unexplained by the 

first (largest) eigenvector is the residual. The second 

eigenvector is associated with the second largest eigenvalue, 

and explains the maximum variation remaining in the residual 


field, and is given by 
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m 
Ya Ys - (10) 
i=l 


Therefore, the first two eigenvectors together explain 


m 
Yj + ¥o/ 2) ae 
1=1 
@f che total variation in R. The process continues with each 
successive eigenvector describing the maximum remaining varia- 
tion in the residual field. The final eigenvector is simply 
any variation in the total mean field left unexplained by the 
combination of all previous eigenvectors. As the last eigen- 
vector explains all of the remaining variation in the field, 
the total variation in R is explained by all of the eigenvectors. 
Any of the original fields (cases) may be obtained by 
Calculating the EOF coefficients. These coefficients (called 
multipliers by Stidd, 1967, and others) are also orthogonal 


and are found by defining: 


(12) 
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former C is an MX N matrix. The nth row of the coefficient 
matrix (C) is the orthogonal coefficient vector corresponding 


EOuEme Nen Case. The Input data Matrix A may be retrieved by 


(13) 
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Weweh exactly replicates each data case in A. One of the 
primary advantages of EOF analysis arises from the fact that 
the first few eigenvectors often describe a large portion of 
the total variance in a sample, depending on the structure 
and correlation in the field. One may quite accurately 
approximate the actual field by retaining only the largest 
few eigenvectors. Assuming 500 cases, the initial data matrix 
required to describe the synoptic fields is a 120 X 500 matrix, 
which has 60,000 elements. Using only the first 10 eigenvec- 
tors and orthogonal coefficients, the original fields may be 
represented accurately by multiplication of two matrices, 
the first a 120 X 10 matrix of truncated eigenvectors, and 
the second a 10 X 500 coefficient matrix. The total number of 
elements in both matrices is only 6,200. Since EOF analysis 
allows a high percentage of the total variation to be explained 
by only the largest few eigenvectors, it 1s seen that the data 
may be accurately estimated using as little as 10% of the total 
number of data points. 

This significant reduction of dimensionality makes EOF's 
a prime tool to use for climatic estimation, and has been 
used as such by Horel (1981), Kidson (1975), Walsh and Mostek 
(1980) and Walsh and Richman (1981) among others. 

All N observed fields are represented by the linear 


combination 


= 


S. = n oe een (14) 
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where a is the nth cases. Thus each case may be represented 
as a linear combination of the orthogonal coefficients and 
elements of the eigenvectors. The first k eigenvectors 

(k << m) generally represent a large portion of the total 
ference ina. Keeping only the largest k eigenvectors, the 


actual cases may be very closely approximated by: 
k 
= > Cs e ee eas ee NI (15) 


If one retains only significant eigenvectors, maximum infor- 
mation may be retained with little complicating noise. This 
leads to the obvious problem regarding the optimal number of 


eigenvectors to keep. 


C. SELECTING THE NUMBER OF EIGENVECTORS 

In the previous section, it was demonstrated how a data 
field may be represented accurately by a linear combination 
of only a small number of eigenvectors and coefficients. The 
question of how many eigenvectors to retain is vital. Simply 
stated, the question is at what point does the linear combina- 
tion no longer add signal, but only describe noise in the data. 
Unfortunately, there is no single accepted answer to this 
question. Several possibilities are presented here. 

The classical principal component approach is outlined by 
Morrison (1967), and assumes a very large, normally-distributed 
sample for the data. In this case, the significant eigenvectors 


may be identified by asymptotic behavior of the eigenvalues. 
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One seeks those eigenvectors that are significantly different 
than zero. Anderson (1963) has shown that sampling problems 
using normalized data are much more complex than when non- 
normalized departures from means are used. Therefore, the 
initial development given here assumes non-normalized data, 
because the mathematical description 1s easier to follow. When 
the number of observations is very large, Anderson (1963) 

shows the quantity Yn(2&.-d,) is distributed normally about a 
zero mean, with variance of 2r;. Here ds is the sample popula- 
tion eigenvalue, Ay is the total population eigenvalue, and 

n the number of cases. Further, Anderson shows the eigenvalues 
are independent of each other. In this case, one may uSe a 
confidence interval approach to determine if the eigenvalues 
are significantly different than zero. If an eigenvalue is 

not significantly different than zero, the associated eigen- 
vector describes only random noise. The confidence interval, 


given by Morrison (1967) ls: 


ip Xe 
en OD (65) 
1+ 2) ;9,72/n b= 2) Qq%24/n 
where: 
Z, isethe %taacdard two tail z2 score (z = 1.96 


- gives a 95% confidence interval) 


The asymptotic decision rule is simply that the eigenvector is 


discarded unless the lower limit in (15) 1s greater than Zero. 
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While this method is sound theoretically, and works very 
well for large data sets, Preisendorfer and Barnett (1977) 
point out that data sets used in meteorological (and oceano- 
graphic) studies are rarely of the size for which asymptotic 
behavior begins to emerge. In fact, Preisendorfer and Barnett 
suggest that a sample size on the order of 1000 cases may be 
required before asymptoticity applies. Since the data set 
used in this study is much below this size, the classical 
asymptotic selection approach for determining how many eigen- 
vectors to retain was not used. 

Another approach used throughout the literature (e.g., 
Rinne and Karhila, 1979) involves examination of the natural 
logarithm of the eigenvalue. This method is called the LEV 
(Logarithmic EigenValue) diagram method. The basis of this 
method is that the eigenvectors for those components that 
describe signal have a different structure than those that 
describe noise. Furthermore, it has been noticed that the 
structure change is most easily noted when natural logarithms 
of the eigenvalues are examined. To use the method, the eigen- 
values are first ordered, from largest to smallest. This 
method will work if there is a distinct change in slope of the 
ordered eigenvalues at some point. All eigenvalues larger 
than this slope change point are retained, and all smaller ones 
omitted. While this method apparently does well in some cases, 
and is exceedingly simple to use, it is not used in this study 


for several reasons. First, it is not at all clear that a 
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break in the slope of the eigenvalues at some point is the 
demarcation point between those eigenvalues that describe 
Signal and those that describe noise. Secondly, even assuming 
the break in the eigenvalue slope does indeed mark the point 
in signal-to-noise domination shift, the method is scientif- 
ically unsatisfying because there is little statistical jus- 
Picneation for its use. 

Another method that appears in the literature is to select 
the number of eigenvalues and vectors a priori, or select a 
percent total variance explained value as the cutoff point a 
priori. Richman (1980) presents several of these methods in 
detail. For example, Cattell (1958) recommends retaining 
all eigenvalues necessary to explain 99% of the total variance. 
Guttman (1954) recommends retention of all eigenvectors asso- 
ciated with eigenvalues larger than 1. Both of these methods 
in effect involve probable overfactoring. That is, use of 
these methods leads to keeping more eigenvectors than are 
actually required to adequately explain the data. This in 
and of itself is not serious unless the eigenvalues and vectors 
are rotated to better fit the clusters in space (see Richman, 
1981), but it does tend to defeat the purpose of EOF analysis. 
If overfactoring occurs, one does not receive maximum data 
reduction. Since the purpose of this study was to reduce 
the synoptic scale forcing fields to only a few easily separable 
components to aid in determining typhoon movement, underfactor- 


ing is not a real problem. 
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Richman (1980) used a novel approach to determine how many 
eilgenvectcrs to retain. He also used rotation of components, 
which is discussed in detail in the last section of this chap- 
ter. His criteria was defined as "meaningfulness". That lis, 
if the component had apparent meaning (if the component field 
was interpretable synoptically), the component was retained. 
It has been demonstrated (for example, Craddock and Flood, 
1969) that higher order eigenvectors and components degenerate 
to little more than a series of uncorrelated high and low value 
regions. This means that there is some scientific justifica- 
tion to Richman's method. Nevertheless, it was not used here 
because it is entirely subjective, and therefore could give 
inconsistent results when used by different researchers. 

Brown (1981) used the method of retaining the number of 
components that explain a "reasonable amount" of the total 
variance. Specifically, using the same grid and data fields 
that are used in this study, he carried out experiments in 
map typing using the largest 10, 15 and 20 of the 120 eigen- 
vectors. This selection approach is rather arbitrary, since 
there is no objective way of distinguishing what the eigen- 
vectors are representing with respect to the signal-noise 
problem, and specifically, if any signal is being omitted. 

The final method, which is used in this study, is based on 
a selection method introduced by Preisendorfer and Barnett 
(1977). In essence, the scheme is a Monte Carlo approach to 


determining the number of eigenvectors to keep. It 1s not 
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very different from the classic asymptotic appraoch described 
by Morrison (1967). The main difference is that it is assumed 
by Preisendorfer and Barnett that not enough cases are avail- 
able to use an asymptotic approach with geophysical data bases. 
One key assumption is that the true (physical) variables are 
normally distributed at all individual grid points. The simu- 
lation input data are normally distributed, with mean zero, 
variance one, which is just simulation of point normalized 
@aea. Given these constraints, and using a large number (N > 100 
is recommended by Preisendorfer and Barnett (1977)) of simula- 
tions, one can create sufficient numbers of random fields to 
Simulate accurately the eigenvalues that result if the process 
is purely random. In addition to calculating the mean value 
of the simulated eigenvalue, the standard deviation of that 
eigenvalue is calculated over the 100 or more simulations. If 
the true physical eigenvalues deviate from the simulated random 
field eigenvalues by more than two (three) standard deviations, 
Ome is 95% (99%) confident that the field is significantly 
different from a field that is purely random. In other words, 
1f deviation is by more than two standard deviations, one is 
reasonably assured that the eigenvector is describing signal 
rather than noise. The simulated eigenvalues obtained in this 
study will be presented in the next chapter, along with the 
elgenvalues obtained from analysis of the data. In using this 
Monte Carlo method, 504 simulated 120 point random grids were 
obtained. The eigenvalues of these random fields were found 


and stored. This process was repeated 100 times to obtain the 
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simulated eigenvalues and standard deviations of the eigen- 
values. These were then compared to the true data eigenvectors. 
One caution must be stated concerning use of this method. 
Richman (1980) points out that this method has potential to 
Slightly underfactor. However, this is not of primary con- 


cern here since the potential for underfactoring is only slight. 


D. ROTATION OF VECTORS 

Rotation methods seek to rotate the eigenvectors (axes) 
in space to better fit data clusters. There 1s some contro- 
versy existing (Richman, 1980) as to whether rotation of the 
resultant components (eigenvectors) should be employed. Many 
of the potential schemes have been surveyed in detail by 
Richman (1980), who describes some of the specific strengths 
and weaknesses of the schemes. 

A very simple example of rotation follows. Suppose that 
two distinct data clusters are positioned (in Cartesian two- 
dimensional space) at 5] and i. Following the method out- 
lined earlier in this chapter, the eigenvalues would then be 
ince (for non-normalized input data). The eigenvectors would 
be (T and i respectively. It is noted then the first 
eigenvector (which explains 90% of the total variance) bisects 
the two data clusters in space. The second eigenvector does 
not really fit the data clusters. Even the first eigenvector 
does not give a true representation of the clusters in space. 
Misrepresentation of this type may be eased by use of rotation. 


The two broad classes of rotation that are employed are the 
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orthogonal and the oblique. Orthogonal rotation pivots the 
eigenvectors identically so as to maintain the orthogonal 
relationship. It is seen in the simplified case just presented 
that an orthogonal rotation would never give a perfect repre- 
sentation of the input clusters, as the input clusters only 
have a 45° angle between them in the two dimensions, and are 
assumed to occur with equal frequency. Oblique rotation, on 
the other hand, pivots the vectors so as to most closely fit 
the data clusters without necessarily retaining the orthogon- 
ality constraint. In the simplified case just presented, the 
vectors would be pivoted (within a scaling factor) to (51 and 
feel The vectors are no longer orthogonal, nor is it possi- 
ble to determine quantitatively the amount of total variation 
explained by either of the vectors without exhaustive analysis. 
Richman (1981) uses pre-determined input fields to simulate 
the principal component processes. He then compares non- 
rotated components to both orthogonally and obliquely rotated 
components. His results show obliquely rotated components 
give vastly improved delineation of the input patterns. He 
then concludes that obliquely rotated components are a better 
tool to use for map typing than either orthogonally rotated 
Or non-rotated components. If the purpose is to identify and 
interpret all types of meteorological patterns that force 
another event, obliquely rotated components would appear to 
give superior results. 

Rotation was not used in this study for several reasons. 


Delineation of patterns of meteorological features was not the 
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specific purpose of this research. EOF's were used in this 
study for two purposes. First, they were used to obtain the 
orthogonal coefficients which are used in the formulation of 
regression equations to forecast tropical storm movement. 
Secondly, they were used to reduce the data. The first pur- 
pose of the research makes physical identification and inter- 
pretation of the resultant eigenvalues less critical. It is 
the orthogonal coefficients derived from the linear combination 
of the eigenvectors that are used, not the actual eigenvectors 
themselves. Nevertheless, it is desirable to use the resultant 
eigenvectors with certainty to identify and interpret the forcing 
features. It is primarily due to the data reduction purpose 

of this study that use of rotated components becomes less 
attractive. Since the amount of explained variance (by each 
component) is unknown after rotation, the question of how many 
eigenvectors to retain becomes unclear. In fact, perhaps the 
only valid criteria for retention becomes Richman's meaningful- 
ness criteria. In any case, the problem of determining how 
many vectors to retain becomes much more difficult after rota- 
tion has been employed. 

An even more insidious problem with rotation of the vectors 
1s the effect of underfactoring on-the resultant vectors. 
Richman (1981) also experiments with underfactoring. If too 
few vectors are retained and rotated, then the resultant 
rotated vectors become combinations of vectors associated with 


several actual input data clusters. Therefore, if underfactoring 
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exists, the same type of bisection that is seen in the worst 
possible case with unrotated vectors may occur with the rotated 
vectors. Since data reduction in this study is paramount, 
rotation of components seems ill-advised at the present time. 
As a final note, Richman's results, and the simplified 
results shown at the beginning of this section clearly show 
non-rotated components may not represent the true synoptic 
patterns. Conceptually, if the data clusters (input data) are 
not symmetric, errors in the EOF representation are less likely. 
This is perhaps most easily seen with a simplified example. 
If, for instance, in two dimensions, there are two data clus- 
ters occurring with equal frequency, one of the resultant 
eigenvectors will bisect the two clusters. This is the case 
in the simplified example above since the two cluster points 
were assumed to occur with equal frequency. If the clusters 
Gemnmot Occur equally, this bisection does not occur. Richman's 
Simulated fields were input in mirror-image pairs, with equal 
probability of occurrence. In this case, the resultant eigen- 
vector bisected the given input fields. True geophysical 
synoptic fields are not orthogonal in nature (Barry and Perry, 
1973 and others). On the other hand, it is anticipated that 
true geophysical fields do not come in matched opposite pairs 
that occur with similar frequency. It is for this reason that 
the first several unrotated vectors should indeed represent 


actual synoptic variability patterns. 
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IV. RESULTANT EMPIRICAL ORTHOGONAL FUNCTIONS 


The mathematical and theoretical framework for EOF analy- 
Sis was developed in Chapter III. In this chapter, the forcing 
of each eigenvector on tropical storm movement is examined by 
correlation of storm motion with the strength of the particular 
vector for a given data case, which is given by the value of 
the orthogonal coefficient associated with the vector. Before 
any meaningful analysis of physical forcing on typhoon motion 
may be attempted, the actual eigenvectors must be examined. 

Following the mathematical development of Chapter III, the 
120 X 504 data matrix was normalized at each grid point, and 
the eigenvectors were obtained for all three data levels (500, 
700 and 850mb). The resultant eigenvalues for all three levels 
were then compared to the random eigenvalues generated from 
Monte Carlo simulation using 100 simulations, as suggested by 
Preisendorfer and Barnett (1977). These Monte Carlo eigen- 
values were all computed from 120 X 504 matrices whose elements 
were random normal variables with a mean value of zero anda 
standard deviation of one. Thus the statistical structure of 
the random fields is identical to the real data normalized 
fields. The value of the eigenvalues for the three levels is 
given in Table 4-1, which also gives the cumulative percent 
explained total variance for each successive eigenvector. Table 
4-2 is a list of the randomly generated eigenvalues and their 


standard deviations for comparable modes. If the real data 


Le, 





TABLE 4-1 


ned variance 


+ne normalized D-value at each level. 


nd cumulative percent explai 


for 


thesis) 


18] 
" 
qd): 
| 
~t 
wo 
b> 
G 
qd) 


GS 
A) 
mH 
ric 
fq A. 
CS 
orf 


850MB 


7OOMB 


50 0MB 


EIGENVALUE 





I fg PO gn agg IDO fig, TCG OO GIO iT Mn, OO Oy, 
WIN DO HMAAM™ DOr YHIMN—OMK-NO 
ee @ @ @ @ @ @ e@ @ @ @ @ @ @ © @ @ @ @ @ 
MONO KH MDONSFTOM DHOOM ANM 


MANUIO P= MF WD DODDDAAAHAMH 
ne er ig ene egg fl tne ea ae a tle ct) et Na i aE nt 


MOM MND FE OD—SKMMAHAANHNHNA KM AN 
FOHDDMANAMAMNSrRrM+sowrnNown 
TFHOMNOSK FHNMNKKMMNADWADMMDW ONY 
Cr 
COT DVOAIMNANNT Con 

INE 


gy Rig ey a a a ag Og PTR, PAO Ga OM, gag ag I, i, saya, 
CE SHAN FWOWOW OM ME AVNAN™ OM ANE 
e*eeeeee @® # @ @¢ e 8 # @# @# @ @ @® # 
FOWMNDWANDONAMNUWMDNHANO-AN 
NTNWOOPM TF DODDODMDANRHAA 
et) Se a a tg tag al CP gag ll CD ce Pg PtP gt AP 


NOH MAM AWK Or Oovuwdstrasamo 
WONMNDORMNAAIIS KHTMNAWM (I ANo- 
ND OK THmMONNHYO FNCOON™ worvo 
ee @ @ @ @ © @ @ @ @ @ @ @ @ © @ @ @ @ 
ANKE MNMNANANNQC eK cece” 

NN 


On a Ra i gf i Gn Gli, Ga fa PT iT I OOI, GTn G ig, ,, GO GTEn, 
MMH Met Or™ovornr—ororm 
ee © e@e@ @® @© @hcemhmhUHhmUCUFhmhUC<C MhCUhOhUlUchOMmhUC OMhlUhMhUCCUrOhhlUCUc OhmhUCrOhlUhMhUCCUOhl 
AT ANOSFDONTFOMODOHROKANMHM™M 
MAMN ORS MF DW MO DOD ODAAHHMHO 


leet cle ati ett dietitian peti atti iinet e 


AM™MOMNMNDOOTKTNTKANOWFrOO- 

WONDWMNOARAMANHAEKNDOOMNAOPF NO 

DONNSIONNSE += KT Or ANON ™ oM 
ee. 6¢°06U6©® e e e@ e e ° e066 e.6.hO0©° e668 ° ® e ° 

ATK—NMONMNANANNC Seer 

MN 


TAM ANOMDNOKAMANOF OHNO 


ol coe soe eh eel eel el, oe ee ol 


— = — 
iN @) oY Ww 
r) 8 e 
pe co oO 
On Oa) oa) 
eo" e oe" e eq “= 
® ° ® > e 
be § wn ~~ 
og) uw) N 
= © © 
e e * 
— — — 
~ cO wn 
° e e 
~~ Oo Or 
On nN Ov 
e~" e e™” e ew 
e ® e @ ] 
N © Oo 
To) \O N 
Sad © SO 
e r e 
~-~ -~ = 
N => oO 
e e ® 
co oO n 
nn oO oO 
eo" e eae“ eo ew 
° e ® r) 8 
= ©) \O 
= ae] —— 
= So © 
r) r) e 
© *MD © « (MD 0e@ « © 
ee © sf 0@ 2© 810M © ee #*@& 


(1 .002 (100.0) 


.001 (100.0) 


.000(100.0) 


120 


56 





DEVIATION 


EIGENVALUES PLUS 
TWICE STANDARD 


Oonding to the 


Monte eae re method 


STANDARD DEVIATION 


ted by the 


TABLE “-2 


genera 
OX 


envalues and standard deviations corres 
EIGENVALUE 


in Table 4-1 as 


9 
(see description in 


modes 
MODE 


DTFOMNDEAIANANTAHAOOrMNSAVNAWY aes & ~ 
WI HOH OD FOMMNKHOMKAOMEMD «© 6 om © « ome © eo OH 
NOK DONNND MOM EHMMOWWWON « « eN 0 « eD ¢ « 0 
e* e0e@ © © © @ © © © © & © Oe ee ee ee @ re) 9 e 
NAAN N eC err rrr rer err rece 5 sae 
IHFCOMOMMMNAMNANADEeAN—DOOrn ne 9 aa © 
TFOAMMANANAANANAANANQN & OAT mmm « 0 oth & © om 0 tom 
OOD OOO VDOMOVDOVOOOOOO00O « eM e © eM 0 ¢ cD 
e ee @¢ e e t e e® 66 e e e e *¢@ eo e e 3 ® e e +] e 
KHODWMFOANETANNAUWANNDOF—-O ae) WO er) 
WOOTOMIANHNOMOP SFADMMNNOMOM ee « «OO « © eN « © of 
mF OONND DDMOOMMOwWOwWwmnw e « eN « © eH « © oN 
eeeeetf © © © © @ @ @e &e © ee ee le Q a r) 
ENON QIN cr Pe! FB COE CCT CCL ECS eK = 
© 
Mm AMTNOM DHNOEKNAMAFAMOElrP DN « « «Me « eM & &« ON 
Sec eee aN «6 «=F + MO te oe 


3% 


fy 





eigenvalue for a specific mode is greater than the random 
eigenvalue plus twice the standard deviation, the eigenvalue 
and corresponding eigenvector represent geophysical signal, 
and the eigenvector is retained. To facilitate this compari- 
son, the value of the random eigenvalue plus twice the standard 
deviation is also given in Table 4-2. The values of the stan- 
dard deviations in Table 4-2 are consistent with Preisendorfer 
and Barnett's (1977) results. Comparisons of the three actual 
field eigenvalues to those of the random field are conducted 
separately, since the number of significant elgenvectors may 
be different for each level. The only relationship between 
the eigenvectors of the three levels comes from any dynamic 
vertical coupling that may exist. 

Several interesting features emerge from examination of 
the eigenvalues. The Aerie of eigenvectors to retain is dif- 
ferent depending on the retention scheme chosen. For example, 
Guttman's lower bound test suggests retention of the first 14 
or 15 eigenvalues for these levels. Cattell's 99% retention 
rule would indicate retention of more than 40 modes at each 
level. The Preisendorfer and Barnett selection scheme is much 
less conservative, and suggests retention of only 10 eigenvec- 
tors at 850 and 500mb and 1l at 700mb. Because the Preisendorfer 
and Barnett method keeps fewer modes, the potential for under- 
factoring increases. Since only 10 or ll eigenvectors are to 
be retained, roughly 15% of the variance in the fields is 
directly accountable to random fluctuations (noise). This 


amount of unexplained variance is not unrealistic in the 
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tropics. These errors are most likely due to either intiali- 
zation or measurement error in the fields. This is not sur- 
prising because the initialization problem in the tropics is 
difficult (weak governing mass-wind balance relationship). 

Even more importantly, there is a very small gradient in the 
geopotential field, except in the region near the tropical 
storm. This would tend to give a greater weighting to any 
observational error in the tropics, compared to the mid-latitudes, 
where a linear balance initialization with quasi-geostrophic 
constraints can be imposed to reduce errors in the height 
fields. Since the areal extent of the grid incorporates a 
large portion of the tropical synoptic forcing field (Fig. 2-1) 
it is entirely conceivable that there is a 15% level of random 
error in the D-value fields. 

The 500mb eigenvalues from Table 4-1 are graphically com- 
pared to the Monte Carlo simulated eigenvalues (Table 4-2) in 
Fig. 4-1. It is seen the actual 500mb eigenvalues decrease 
very rapidly with increasing mode, which indicates that a large 
number of the components represent data clusters containing 
random noise. Graphs of the 700 and 850mb eigenvalues are not 
included because they are very similar to the 500mb values. 

Preisendorfer and Barnett's assertion that asymptoticity 
does not apply for a sample size of 504 data cases may also 


be examined. If the asymptotic results are valid, the ratio 
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should be very nearly constant. Here Qe is the mean randomly 
generated ith eigenvalue, Ss is the standard deviation for the 
ith mode, n is the number of cases and m is the number of 

grid points. The value of this ratio is given in Table 4-3 
for selected modes. It is seen that the ratio is not con- 
stant, nor does it approach the theoretical value expected 

for asymptoticity. Thus it is concluded that asymptotic 


theory is not valid for this study. 


TABLE 4-3 


Test parameter for the asymptotic theory of eigenvalues 
is shown for various modes (see text for details). 


MODE a 2 5 10 pS) 20 40 60 
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Based on these tests for significant eigenvectors, it was 
decided to retain the largest 10 eigenvectors for all levels. 
These first 10 eigenvectors at 500mb are shown in Figs. 4-2 
through 4-11 and will be examined in detail. The first 10 
elgenvectors for both the 700 and 850mb level are shown in 
Appendix A, without comment. The discussion of the first 10 
eigenvectors at 500mb will include an interpretation of the 
probable forcing that the particular pattern has on the tropi- 
cal storm, which is always at grid point 70. 

The actual values of the eigenvectors in Figs. 4-2 through 
4-11 are non-dimensional, since normalized data are used on 


input. The broad scale forcing features of an eigenvector do 
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have meaning in the standard meteorological sense. Areas of 
higher values of the eigenvector may properly be thought of 

as high pressure (D-value) regions, areas of low elements as 

low pressure regions, and more strongly packed isopleths 
indicate stronger flow regions. Finally, it 1s stressed that 
each eigenvector actually represents the pattern shown and the 
exact inverse of the pattern shown. Relative gradients of the 
patterns and positions of the closed isopleth features remain 
unchanged for the positive or inverse eigenvectors. All follow- 
ing discussion will be made using the eigenvector pattern 

shown; the inverse case will not be discussed. Relevant features 
for the inverse pattern may easily be obtained following 

the same reasoning as below. 

Eigenvector 1 (Fig. 4-2): This pattern shows a band of 
stronger easterlies directly to the north of the cyclone. 
Additionally, there is a slight northerly component to the flow 
directly upstream of the storm. The forcing of the tropical 
cyclone for this type of pattern should be to the west and 
south. 

Eigenvector 2 (Fig. 4-3): This component shows small gradi- 
ents throughout the field, as expected in the tropics. As with 
pattern 1, a broad band of easterlies is seen to the north of 
the storm, but they are much farther north than for pattern l. 

A primary difference between this component and the first vec- 
tor is that there appears to be a low centered south-southwest 


of the storm, while this low was to the south-southeast for 
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vector 1. This component and component 1 both exhibit proper- 
ties of planetary scale waves, as they both have very low 
wavenumber over the 70 degree longitudinal span of the chart. 
This pattern should induce weak forcing to the west and to 

the south. 

Eigenvector 3 (Fig. 4-4): An entirely different type of 
pattern compared to the first two components 1s seen here. The 
vector has a fairly strong area of lower values to the west, 
with a small higher valued area south-southeast of the storm. 
Another small low is seen well to the northeast corner of the 
pattern. Forcing on the storm should be to the north (strongly) 
and east (weakly). 

Elgenvector 4 (Fig. 4-5): The predominant feature of this 
vector is a well developed low to the north and east of the 
storm. The storm itself appears to be situated in a strong 
flow region between a high and low. The forced motion should 
be strongly to the east, with a weak drift to the south. 

Eigenvector 5 (Fig. 4-6): A strong high valued area directly 
to the north of the storm is the predominant feature in this 
e€lgenvector. The pattern is essentially weavenumber 1 across 
the 70 degree span of the chart. The physical analogue of 
this vector is difficult to determine. It could well be that 
this 1s a bisection of two distinct data clusters of high pres- 
Sure on the outer extremities of the grid, since this pattern 
bears strong resemblence to the non-rotated bisection case 


Simulated by Richman (1981). In any case, the elgenvector is 
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usable with coefficients that appear in the formulation of 
regression equations, and does indeed describe a global wave- 
number 5 pattern. This pattern should force tropical storms 
to the west and north. 

Eigenvector 6 (Fig. 4-7): This pattern is another wave- 
number l across the 70 degree longitude span of the grid 
(global wavenumber 5). The dual low centers are generally 
Similar to the pattern in eigenvector 3. The forced motion 
of the tropical cyclone should be to the west, with little 
meridional forcing. 

Eigenvector 7 (Fig. 4-8): The expected higher degree of 
complexity for higher order modes is beginning to show in 
this vector. Five well-defined high or low centers are seen 
in the pattern. This vector is approximately global wavenumber 
7, so that with this eigenvector the slow transition from 
large scale to smaller synoptic scales is beginning. The 
physical meaning of the pattern is also becoming more diffi- 
cult to define. The forcing of the storm should be weakly to 
the north and west. 

Eigenvector 8 (Fig. 4-9): As with eigenvector 7, there is 
a complex pattern of well-defined high and low value centers, 
with the storm located in the northern regions of a high 
center. Forcing to the east and south is anticipated from this 
pattern, although all forced motions should be weak. 

Eigenvector 9 (Fig. 4-10): Eigenvector 9 is somewhat sur- 


prising since it has less complexity than the preceeding two 
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eigenvectors. Nevertheless, it is approximately global wave- 
number 7. A strong blocking high center is found directly to 
the west of the storm, while the storm itself is on the west 
Side of a weaker low. It 1s possible that the blocking high 
pattern represents the effect of the 500mb anticyclone east 
of the Tibetan Plateau heat low. Motions forced from this 
pattern should be weakly to the south and east. 

Eigenvector 10 (Fig. 4-11): The final eigenvector retained 
in the truncated set of 10 is the most complex. A series of 
well developed highs and lows are seen throughout the extent 
Semenewgqrid. Short range forcing on the storm would come from 
a high located south of the cyclone and two strong low centers 
flanking the storm. The pattern 1S wavenumber 2 over the 70 
degrees covered by the grid and corresponds to a global wave- 
number 10. This pattern defines even smaller synoptic scale : 
forcing than the previous patterns. Perhaps coincidentally, 
the eigenvector 10 for the 700mb data set (Appendix A) is 
virtually identical. This similarity indicates this pattern 
1s probably a true physical signal, which is vertically coupled 
through the mid-troposphere. Motion forced from this pattern 
will be to the south with little zonal forcing. 

It is essential to show how these ten eigenvectors just 
described would combine to represent the original field. Selec- 
tion of a case on QOOOGMT 27 August 1967 was made at random 
to demonstrate the reconstruction. At this time, Typhoon 
Marge was located at approximately 18°N 125°E with maximum 


winds of 125 knots. The actual 500mb D-value field is shown in 
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Fig. 4-12. The areal extent of the grid is from 43° to 8°N, 
and 85° to 155°E. Therefore, this grid encompasses both 
tropical and mid-latitude forcing on the storm. A linear 
combination of the first ten eigenvectors and the associated 
orthogonal coefficients should be adequate to represent the 
relevant physical features according to the discussion in 
Gaapeer IIil.b. 

Among the salient features seen in the total field (Fig. 
4-12) is a strong blocking high pressure to the northwest of 
the typhoon, positioned at about 25°N, 100°E. A 500mb high 
pressure at this location is east of the Tibetan Plateau heat 
low which is a stationary feature of the planetary circulation. 
There is also a strong high pressure cell (D-values in excess 
of +320 meters) to the northeast of the typhoon. This second 
high pressure is the westward extension of the subtropical 
anticyclone over the western Pacific. Well to the north of 
the cyclone is a strong band of mid-latitude westerlies. A 
well-developed trough extends from the westerlies into the 
tropics and encircles the typhoon. 

As the input data have been normalized, the fields need 
to be reconstructed using 


m 
i es psi é.,) 83 + d., ieee, 2. 120, 


where m is the number of eigenvectors and orthogonal coeffi- 


cients used in the reconstruction, d. and S; are the mean 
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and standard deviation of the D-value at the ith grid point, 
and d; is the reconstructed value. 

The reconstructed field using only the first vector and 
coefficient (Fig. 4-13) shows westerlies well to the north 
with a ridge circling over the top of the storm from the east. 
The general features revealed by use of this eigenvector are 
the westerlies and high to the northwest. When the second 
and third vectors are included in the reconstruction (Fig. 
4-14), little information is gained. This is expected since 
these two patterns are not evident in the actual field. 

The inverse of the fourth eigenvector has similarities to 
the actual case being reconstructed. Both patterns show a 
high pressure to the northeast and northwest of the storm 
with a trough in the northern section of the grid. It is 
anticipated that addition of this eigenvector should greatly 
improve resolution of features on the reconstructed field. 
Changes in the field are evident on Fig. 4-15, but the overall 
resolution of the features is not dramatically improved. 
Nevertheless, inclusion of this vector does increase the high 
pressure cell to the northeast of the typhoon, and increases 
the gradient between the mid-latitude and tropical regions. 

The inverse of the fifth eigenvector also has many similari- 
ties to the original field. A significant improvement in the 
shape of the general features is seen after the fifth vector 
1s added (Fig. 4-16). A slight trough appears in the mid- 
latitude westerlies and a coupling of the tropical and mid- 


latitude trough is seen for the first time. Inclusion of the 
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next three elgenvectors (vectors 6 through 8) add very little 
to the reconstructed field, and are not shown. Similarities 
between eigenvector 9 and the original field include a sharp 
trough in the westerlies which connects with a tropical trough 
in the vicinity of the typhoon. When this eigenvector is added 
to the linear combination of the previous eight, the broad 
scale pattern (Fig. 4-17) is delineated much better. There is 
general agreement in the positions of the large-scale features 
and the gradients between them. Further refinement through use 
of higher order modes is necessary to obtain the actual chart. 
The difference between the patterns in Fig. 4-12 and 4-18 is, 
according to the analysis here, simply random noise. Never- 
theless, with only the first nine eigenvectors the salient 
features have emerged, and major forcing from the large scale 
on the typhoon is defined. The continued progression in the 
reconstructed fields using 10, 20 and 40 eigenvectors are shown 
in Figs. 4-18 to 4-20. It is noted that the reconstructed 
field is almost exact after 40 terms are included, and some 
features due to random noise in the field are reproduced. The 
correlation of the reconstructed field using various modes to 
the original field is shown in Table 4-4. It is seen here that 
the correlation of the two fields asymptotically approaches l 
as the number of modes in the reconstruction is increased. 
Furthermore, large jumps in the correlation are seen when the 
first and ninth eigenvectors are added, and smaller jumps are 


seen with inclusion of the third and fourth vectors. This is 
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| Fig. 4-12. 500mb D-value (meters) field surrounding 
| 7 Typhoon Marge at QOQOOGMT 27 August 1967. 
-- Marge 1S located at 18°N 125°E (loeation X). 
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Fig. 4-13. Reconstruction of. 500mb D-value field, O0O00GMT 
27 August 1967, using the first eigenvector and 
orthogonal coefficient. This compares to 
Crue Enedid (hag. 4-12). — 


74 








e * @ ¢ @ @ 
Qo OO0O00 





Similar to Fig. 4-13, except first three 


Mag. 4-14. 
eigenvectors are used in reconstruction. 


* e ees es e 
9 FO OC0OONQ 





3°N 
iss% 


Similar to Fig. 4-13, except first four 
eigenvectors are used in reconstruction. 
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Fig. 4-16. Similar to Fig. 4-13, except first five 
eigenvectors are used in reconstruction. 





mig. 4-17. Samilar to Fig. 4-13, except first nine 
eigenvectors are used in reconstruction. 
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Fig. 4-18. Sunsleeto pag. 4-13, except first ten 
eigenvectors are used in reconstruction. 
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Fig. 4-20. Similar Co Fig. 4-13, 


except first forty 
eigenvectors are used 
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in agreement with the reconstruction shown above with the 
exception that the fourth instead of the fifth eigenvector 
seems to have a larger impact on the reconstruction. 

Because inclusion of the eigenvectors l, 3, 4, 5 and 9 
seemed to have the greatest impact in the reconstruction, the 
orthogonal coefficients associated with these eigenvectors 
should have larger magnitudes than the other coefficients for 
this case. The values of the first ten coefficients are shown 
in Table 4-5. The coefficients associated with eigenvectors 
1 and 9 are larger than the other coefficients. Although the 
value of coefficient 5 is the third largest value, it is the 
same magnitude as the coefficients associated with the second 
and third eigenvectors. This is explained in that eigenvec- 
tor 2 tends to re-enforce the pattern of the first vector, 
while the third eigenvector enforces the joint pattern of one 
and two. The coefficient associated with the fourth eigenvec- 
tor is small for this case, indicating that this pattern really 


had little effect on the reconstruction. 


TABLE 4-4 


Correlation coefficient of the reconstructed field, using 
the number of modes indicated, with the actual field being 
reconstructed (see text). 


NUMBER OF 


MODES USED i 8 9 10 
CORRELATION. : : A : : saeco’ 66/34 )« «=. 80S) (wS6/ 


NUMBER OF 
MODES USED ZS 30 40 50 60 EZ0 


CORRELATION . : 936 .974 .994 .993 .994 1.000 
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TABLE 4-5 


Values for the first 10 orthogonal coefficients for the 
case of 27 August 1967. (See text for details). 


Coefficient l 2 3 + 5 6 7. 


Value eo o0e =|. /Ov=ee2 —lee5 -1.03 -.75 





These ten orthogonal coefficients define the pattern, and 
will be used shortly as predictors in regression equations for 
forecasting tropical cyclone motion. The hypothesis is that 
the forcing of typhoon motion may be determined from the vari- 
ous eigenvector patterns. As a preliminary test of this hypothe- 
Sis, the zonal and meridional components of the typhoon motion 
(in nautical miles for various times) are correlated with the 
orthogonal coefficients associated with the eigenvectors (ob- 
tained from base time field). The correlations are calculated 
on 12-hour increments for the 1l2- to 84-hour displacement using 
the Pearson product moment (Dixon and Brown, 1979). Because 
the motion is defined to be positive to the north and to the 
west, a positive correlation means increased north or west 
forcing, relative to the mean displacement at a given time, with 
an increase in the value (not magnitude) of the coefficients. 
This holds for both the positive and negative (inverse) coeffi- 
clents in that increases in value for a negative coefficient 
(decrease in magnitude) decreases the south or east forcing, 

Or equivalently increases the north or west forcing. Each 
coefficient contributes to the total forcing, and the total 


movement is a summation of the forcing in all directions by all 
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eigenvectors. Correlations are obtained for a dependent set 
of 454 cases (or fewer for longer time intervals). Assuming 
the motion and orthogonal coefficients are both distributed 
normally, Chatfield (1980) shows the distribution of corre- 
lation coefficients for uncorrelated variables is distributed 
N(O0,1/N). This means that any correlation of less than about 
-09 is not significant (at the 95% level). Tables 4-6 and 
4-7 give the correlations for zonal and meridional motion, 
respectively. 

Most of the correlations agree nicely with the instan- 
taneous forcing of the eigenvectors inferred from Figs. 4-2 
to 4-11, although there are surprises. Perhaps the largest 
Surprise is the shift in meridional forcing in eigenvector l 
as the time interval increases. For times less than 36 hours, 
the forcing is the anticipated south forcing. The forcing 
at 48 and 60 hours is not significant, indicating the strength 
of this pattern at this time level gives little information on 
resultant 48- and 60-hour meridional motion. Between 72 and 
84 hours, the forcing of this eigenvector actually becomes 
Signficiantly northward from the mean 72 to 84 hour meridional 
displacement. A possible explanation for this phenomenon is 
that this pattern identifies recurving storms. During the 
short term, the forcing is to the south, but even more strongly 
to the west. The storm then crosses the mean meridional dis- 
placement location after 48 to 60 hours, still well to the west 
of the initial longitude. This is not to say the storm actually 


moves north of the initial latitude, only that the storm moves 
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Table 4 = 6 


Pearson product moment (goa tears between the 
orthogonal coefficient associated with the given eigenvector 
and the zonal _motion at 12 hour increments. A positive 
correlation implies Bos geared Also included is the 
instantaneous motion anticipated from the form of the 
eigenvectors in Figs 4-2 to 411. 


MODE ANTICIPATED TIME INTERVAL 
FORCING 2 24 36 48 60 72 
1 WEST 4.506 +.530 +.553 +.477 +.495 +.358 + 
2 “AEST we = a0Gue~. Ooo ~.001 -.061 -.092 - 
5 EAGT -ealgo—. 1G35—.139 -.,074 ~.0469 =.009 + 
4 Eas. =o =~.412 -. 355 —~.37/3 -.371 -.3671 - 
2) Wii St trogen t guar. 209 +.,2592 “2221 +.284>+ 
6 WEST toot touo4u +. 059 —.043 -.037 -.090 - 
af WEST aio) eu —- Os —.077 -..098 —.058 - 
8 CAST es oe 6 COS —.20G ~.205 —. 2460 - 
g LISTLe aloe. Ogee. UNS —. 1S) ~.132 ~.125 - 
10 NE Stag oy —,ONG tf. Oligeet. O2Getnug | +2027 +.09 3) + 
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Pabiglea — 7 
Similar to Table 4 - 6, except for meridional motion 
and positive correlation implies northward forcing. 


MODE 


So WS Bo es Ss = SS > ca ae ED ae Qe Eee coe 


10 


ANTICIPATED 
FORCING 


NORTH 


SOUTH 


NORTH 


ESR AeA Eo, 


NORTH 


SOUTH 


es 


SOUTH 


— en 3 


+. 362 


Sram 


+.075 


Se ious, 


+ ele) 


+.084 


=—204 7 


-.141 


24 


=. 20m 


-.184 


eco 9 


=. 176 


+.Q34 


5 loys: 


+ .224 


+ .084 


=05 0 


-.176 


83 


36 


-.242 


=D 


t.359 


-.141 


+.017 


-. 136 


f. 2072 


+, 071 


-.007 


care 207 


TIME INTERVAL 


48 


+.017 


=e) > 


co 


cas ia 


+.009 


-.068 


+.254 


+#.021 


#,155 


=. 262 


60 


+. 056 


Senlioc 


+,214 


ren 


-.005 


See 


+. 224 


+, O40 


+. 176 


72 


+.194 


=205 


+.178 


-.040 


+050 


=e 074 


Ae Ue: 


-.054 


#.210 


84 


+, 392 


-.164 


£2061 


-.012 


-. 047 


mar ha 


+. 0186 


-.003 


+. 194 


=.200 -.483 - 98 





north of the expected latitudinal position at around 48 hours, 
and then remains north of the expected position. The westward 
forcing throughout the entire period is not inconsistent with 
recurvature, due to the large initial westward displacement. 
By the 72 hour time, the storm is north and west of the mean 
track displacement at that time, due only to coefficient l 
forcing. The storm displacement from the base time location 
is shown in Fig. 4-21 for all cases that have a 500mb coeffi- 
cient 1 less than -9, while Fig. 4-22 is a graph of storm 
displacement for those storms with a coefficient 1 greater 
than +9. Recurvature is not seen immediately here, and more 
sophisticated statistical analysis techniques are required to 
verify the hypothesis presented above. Nevertheless, these 
two graphs show very nicely how the movement correlates with 
the coefficient value. 

The other correlations shown in Tables 4-6 and 4-7 are 
consistent with the inferred instantaneous motion obtained 
from the eigenvectors. Eigenvectors 3 and 7 (along with 1) 
have the largest correlation (forcing) on the meridional 
motion. Eigenvector 1 has the greatest impact on the zonal 
forcing, with vectors 4, 5 and 8 also showing significant 
forcing. Surprisingly, eigenvectors 2 and 4 also correlated 
Significantly with the meridional motion. From the results 
shown here, the anticipated forcing iS in good agreement 
with the actual motion, and justifies use of the coeffi- 
cients as predictors in regression equations for the storm 
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V. REGRESSION ANALYSIS 


In the preceding chapter, it was demonstrated that the 
orthogonal coefficients associated with eigenvectors give 
qualitative insight to physical forcing mechanisms acting on 
tropical storms. Therefore, it is hypothesized that it is 
possible to use these coefficients to forecast quantitatively 
tropical storm motion. A regression approach is appropriate 
to investigate this hypothesis. Very briefly, regression 
analysis involves using a linear combination of known quanti- 
ties (predictors) to estimate the value of an unknown quan- 
tity (predictand). Dixon and Brown (1979) give a concise 
summary of regression analysis, while Neter and Wasserman 
(1974) provide theoretical background of the technique. [In 
the initial portion of this chapter, the model is developed, 
with model results appearing at the end of the chapter. 

It was decided that of the 504 total data cases available, 
50 would be used as independent cases to test the resultant 
equations. Use of 50 cases for the independent data set file 
1s arbitrary, but still gives a large dependent data set. [In 
the initial set of 504 cases, 185 cases had both complete 
past histories (warning positions 36 hours prior to the base 
time) and best track positions that extended to 84 hours be- 
yond the base time. Of these 185 cases, 1t was decided to 
hold 35 cases to comprise part of the independent set, leaving 


150 cases with full history in the dependent set. The remaining 
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15 independent cases were selected from the remaining cases 
without complete history. All cases in independent data set 
were selected randomly within their respective history sub- 
sets. This process left 454 potential cases over which the 
regression equations were formed. The fifty independent cases 
are shown in Table 5-1. It will be shown shortly that the 
actual number of cases used to derive the regression equa- 
tions is less than 454, due to the specifications of the 
predictors. 

Predictands for this study are the 12- to 84-h zonal and 
meridional displacements of the storms in 12-hour increments. 
These distances are determined from the base time JTWC warn- 
ing position to the JTWC best-track position at the predic- 
tand time. Positive motion is defined to the north and to the 
west, since the majority of the displacements are to the north 
and west. As there are 14 predictands, 14 regression equa- 
tions are required for each of the three pressure levels for 
which synoptic data are available. Because the basic data 
are only available at 12-hour intervals, and the analyzed maps 
are delayed several hours, the forecast time must be carefully 
distinguished from the guidance time. A l12-h forecast based 
on OOOOGMT data is the forecast position valid at 1200GMT, 
whereas a 12-h guidance based an the O0O000GMT data would be 
issued several hours after QOOOOGMT and would be valid 12 hours 
after issuance. It is estimated that four hours would be 
needed to prepare and issue the forecast. Hence, a forecast 


issued based on OOOOGMT data could only be used in preparing 
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the O0400GMT guidance. A 12-h guidance will then be valid at 
1600GMT. To insure that an estimate of the position during 
the next 72 hours is always available, forecasts are made to 
84-h after the base time. All subsequent references to 
times will be for forecast rather than guidance timing. 

The potential predictors are identical for all of the 14 
regression equations, with the exception of any predictors 
that are a function of atmospheric level. Predictors are 
sought to assess quantitatively the effect of three different 
features on storm movement: external (to the storm) physical 
forcing, previous movement of the storm, and storm intensity. 
Synoptic (and sub-synoptic) external forcing on the storm is 
thought to play a large role on storm movement (Brown, 1981 
and others). To incorporate the forcing quantitatively, the 
orthogonal coefficients associated with the 10 retained eigen- 
vectors for a particular data case are selected as potential 
predictors. One of the primary objectives in this study is 
to determine how well these EOF's represent large scale 
features. | 

If the storm is to be forecast properly, prior motion must 
also be accounted for (Peterson, 1980). It 1S necessary to 
know toward which direction the storm 1S moving to determine 
what portion of the external forcing will be affecting the 
storm. To do this, twelve additional variables representing 
past zonal and meridional displacements are added to the set 
of potential predictors. All of the prior storm displacements 


are based on warning positions to simulate operational 
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conditions. The six variables for zonal motion are the prior 
12, 24 and 36 hour zonal displacements of the storm, the zonal 
displacements from 12 hours to 24 and 36 hours prior, and 
finally the zonal displacements from 24 to 36 hours prior to 
the base time. The time frames for the meridional displace- 
ments are identical. 

Storm intensity is the third storm characteristic sought 
to assess quantitatively. The most preferable form of this 
data would be a meso- or microscale analysis of the winds around 
the storm. Since this is not available, the JTWC warning 
maximum winds are used to indicate intensity. The intensity 
data are available for the base time, and at 12, 24 and 36 
hours prior to base time. Therefore, the complete set of 
potential predictors includes four predictors for intensity, 
12 for past movement and 10 for the physical forcing. Table 
5-2 is a listing of the 26 potential predictors, along with 
the names used to identify each predictor in this study. For 
a data case to be used in the formulation of the regression 
equations, a complete set of potential predictors and the 
proper predictand had to be available. This decreased the num- 
ber of cases available for computation of the regression equa- 
tions. Actual valid case numbers are presented with the 
results of the regression. Since the number of potential 
predictors 1S initially large, the resultant equations need 
to be examined carefully to determine if any of these pre- 


dictors may be excluded with little information loss. It is 
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The orthogonal coefficient ~~ 


associated with eigenvector 
The orthogonal coefficient 
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associated with eigenvector 2. 
The orthogonal coefficient 
associated with elgenvector 3. 
The orthogonal coerficient 
associated with eigenvector 4. 
ie OFtnogonal coezrtficiecnt 
associated with eigenvector 5. 
The orthogonal coefficient 
associated with eigenvector 6. 
The orthogonal coefficient 
associated with eigenvector 7. 
The orthogonal coefficient 
associated with eigenvector 8. 
The orthogonal coerficient 
associated with elgenvector 9. 
The ortaogonal coefficient 
associated witn eigenvector 10. 
Storm latitude movement 
for 12 hours before base tine. 
Storm latitude movement . 
for 24 hours before base time. 
Storm latitude movement 
1g) hours betore base time. 
Storm latitude mnovement fron. 
24 to 12 hours before base time. 
Storm latitude movement fron. 
to 12 hours before base time. 
Storm latitude novement from 
36 to 24 hours before base “time. 
Storm longitude movement. 
tor 12 hours before base time. 
Storm longitude movement. 
for 24 hours before base time. 
Storm iongitude movement. 
for 36 hours before base time. 
Storm cco” movement from 
24 to 12 hours before base time. 
Storm longitude movement from 
36 to 12 nours before base time. 
Storm eens movement from 
to 24 hours before base time. 
Storm warning maximum wind at 
forecast base time. 
Storm warning maximum wind 12 
hours prior to base time. 
Storm warning maximum, wind 24 
hours prior to base time. 
Storm Warning maximum wind 36 
hours prior to base time. 





desirable to have as few potential predictors as possible. 
Therefore, if it is determined that any of the potential 
predictors add little to the equations, they should be dropped 
from the developmental set, and the equations should be 
rederived over the smaller set of predictors. 

The next decision is how to use the predictors to create 
the equations. Two primary possibilities exist: all possible 
predictors or stepwise regression. All possible predictor 
regressions use all predictors at once to form the regression 
equations. In this study, all 26 predictors would be used 
to formulate the equations. A stepwise regression creates 
the regression equations by adding (or deleting) one predictor 
per step. At each step, the single predictor that is most 
highly correlated with any residual error from the previous 
step is added to the predictors used, and the equations (and 
residuals) recomputed. This process continues until no addi- 
tional predictors meet a pre-assigned significance tolerance 
level. Dixon and Brown (1979) give further details of the 
procedure. Typically, not all potential predictors are used. 

A stepwise screening procedure is used here for two funda- 
mental reasons. First, a stepwise procedure extracts maximum 
information out of minimum variables, and variables that add 
little information are not used. Second, and more impor- 
tantly, Neter and Wasserman (1974) show that if two or more 
potential predictors are highly correlated, retention of both 


may have a deleterious effect on interpretation of the equations. 


Se 





The problem is called multicollinearity. Statistically, the 
effect is to have little additional reduction in the total 
explained variance, while decreasing the degrees of freedom 

in the equation. Since at least some of the potential predic- 
tors are highly correlated, multicollinearity could be a prob- 
lem. By using a stepwise regression approach, the problem is 
circumvented. Whenever a stepwise regression scheme is used, 
a decision on how many predictors are to be used needs to be 
made. Two possible approaches are to use a predetermined num- 
ber of predictors, so that the number of terms in each final 
equation are identical, or to use all terms that meet a pre- 
determined significance tolerance level. For this study, 

all predictors that significantly reduce the variance are 
included in the equations, so that the number of terms in the 
various equations: differs. A tolerance level (F-ratio): of 

4.0 is used for this study (Dixon and Brown, 1979). 

Finally, the form of the equations, either linear or 
polynomial, must be decided. The simplest type of polynomial 
regression involves using all first-order predictors, and 

’ 
nonlinear combinations of the first-order predictors in the 
model. For instance, if there are 10 initially defined poten- 
tial predictors, then the set of predictors used in polynomial 
regression include all 10 first order terms, all 10 second 
order (squared) predictors, plus the 45 nonlinear products of 
all potential predictors. The use of polynomial regression 


may occasionally be of aid in fitting the predictors to the 
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predictands when nonlinear cause and effect 1s anticipated. 
Neumann and Leftwich (1977) use a second order polynomial 
regression to forecast typhoon movement, although their pre- 
feetors dO not ineélude synoptic forcing explicitly. With 26 
potential predictors, as in this study, the number of poly- 
nomial predictors becomes unwieldy. A further justification 

for not using polynomial regression is that the predictands 
give no evidence of interacting nonlinearly with the predictors. 

In summary, 14 linear regression equations are to be formu- 
lated for each atmospheric pressure level, with predictands 
being 1l2~- through 84-h zonal and meridional displacements 
(in nautical miles) in 12-hour increments. Predictors will 
be selected stepwise from a set of 26 potential predictors 
over 454 (or fewer) dependent data cases. 50 cases have been 
held back to test the equations. 

The regression equations are calculated using the Univer- 
sity of California BMDP computer routine linear stepwise 
regression (Dixon and Brown, 1979). Before presenting the 
equations, their ability to explain variation in the predic- 
tand 1s examined by use of a statistic. This quantity may 
be interpreted as the percent explained variance in the pre- 
dictand by the regression equation (using the dependent data 
cases). The R- value for each regression equation is shown 
in Table 5-3. 

Several properties are immediately seen from the R? values. 


First, the zonal equations appear to explain a greater portion 
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Dele > — 3 


Sample size and r* statistic for each zonal and meridional 
agression equation by forecast time and atmospheric level. 


FORECAST INTERVAL (HR) 


WZ 24 36 48 60 TZ 84 
NUMBER OF 
owt 35a 351 329 256 233 163 150 
ZONAL EQUATIONS 
500mb o 794 25 -685 2 Os 2568 ~ 556 444 
700mb oe -2.19 -680 - 600 Boone - 550 - 310 
850mb 784 oT i2 ~651 25717 og 225 -384 
MERIDIONAL EQUATIONS 
500mb ee 476 ~404 ~ 354 Aye 2 sie ~ 208 
700mb ~ 540 486 2419 2347 AA 2) 5, 5 CB) /s 2184 
850mb 2502 463 SSI) 5 2323 OY) Se, 2103 
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of the total (zonal) movement variation than do the meridional 
equations. Over 75% of the total (Zonal) variation in the 
12-h movement is explained by the equations at each of the 
three atmospheric levels. The maximum meridional variation 
explained (54%) is for the 12-h movement using 700mb EOF 
coefficients. Matching forecast times and levels (excluding 
the 84 hour forecast from the 700mb equations), the zonal a 
is always at least .24 greater than the meridional R? for the 
same time period and level. The increased ability of the zonal 
equations is expected because there is greater variation in 
the zonal movement than the meridional movement. The means 
and standard deviations of the zonal and meridional displace- 


ments at the various forecast times are shown in Table 5-4. 


TABLE 5-4 


Means and standard deviations of the predic- 
tands (in nautical miles) for the dependent 
Sample. See text for details. 


FORECAST TIME (HOURS) 


24 Sc 48 


Meridional 
displacement 


mean 119 181 
standard (100) (1450) 
deviation 

Zonal 

displacement 
mean 93 129 


standard (176) (233) 
deviation 
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The mean movement for both directions is roughly the same 
Magnitude, and indicates an average track toward the north- 
west. A more significant difference in the motion is seen in 
the standard deviations, which are larger for the zonal motion 
than for the meridional motion. As both the zonal and merid- 
ional components contribute approximately the same error 
magnitude in the regression equations, the R? for the zonal 
motion will be significantly greater since there is more 
variance to be explained. 

The second property seen immediately in the R? values in 
Table 5-3 is that they decrease rapidly in time for each 
pressure level. For the 500mb equations, a general rule of 
thumb is that the R? decreases by a value of .05 per 12 hour 
increment. It is further seen (Table 5-4) that the standard 
deviation of displacement increases every 12 hours, heighten- 
ing the significance of the decrease of the Re in time. Simply 
stated, the equations predict movement well in the short tern, 
but the errors grow rapidly with increaSing time. 

The final property seen in the Re values is that the 
accuracy of the equations is not a strong function of the 
atmospheric level in the dependent sample case. The 500mb 
R? values are generally larger than at the other two levels, 
although these differences are not significant. A Student's 
t-test, assuming non-identical variacnes in the population, 


was conducted with the null hypothesis that there is no 


Significant difference in the R? values at the various levels. 


oF 





In no case was the test statistic significant at even the 
alpha equal .75 level. Therefore, the null hypothesis is 
accepted that over the dependent sample there is no differ- 
ence in performance of the equations at the different atmos- 
pheric levels. 

Tables 5-5 and 5-6 present the regression coefficients 
of the 500mb equations by direction of movement. For example, 
the 500mb meridional regression coefficients for all seven 
forecast times are given in Table 5-5. The first value given 
is the intercept. The final regression equation prediction 
of displacement is obtained by summing over the product of 
all non-zero regression coefficients and the variable asso- 
ciated with the coefficient. None of the 500mb equations 
use more than 10 predictors. In seven of the 28 equations, 
six or fewer predictors are used. Therefore, these equations 
are very simple to use. A past movement variable was always 
the first variable selected in the stepwise procedure, so 
persistence does play a role in the predicted movement. The 
predictions are not simply persistence forecasts, however, 
Since in general four or five EOF coefficient predictors are 
chosen in each equation. Therefore, forcing also plays a 
crucial role in the storm movement. Finally, maximum wind 
predictors are of little consequence in the final equations, 
indicating little impact on the 12-h (or greater) time scale 
storm motion (excluding short term trochoidal path oscillation). 
The resultant equations for the 700 and 850mb data are shown 


in Appendix B. It is also noted that of the potential 
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predictors, very little information would be lost by 
excluding all past displacement variables except for the 
12-h period prior to base time. Additionally, of the 
intensity predictors, the most frequently selected was the 
12 hour prior intensity. Therefore, it was decided to re- 
derive the equations using only 13 potential predictors 
(the 10 coefficients at the given level, Platl, Plonl and 
Amwl). Results of the equations, in the form of R- Stacis— 
tics, derived on the smaller set are given in Appendix 3. 
The remainder of the results presented in this chapter refer 
to the equations derived using the complete set of all 26 
potential predictors. 

Results presented thus far have been drawn from the 
regression equations using the dependent data set. A true 
test of a regression equation comes through testing with 
independent data. This teSting is critical in determination 
of accuracy of the model. The JTWC annual typhoon report 
publishes, in addition to best track and warning positions, 
the forecast errors for 24, 48 and 72 hour forecasts. The 
regression model was tested with the independent data and 
is compared to the official JTWC forecast error, which 
serves as a benchmark. Of the 50 independent cases, only 
45 have JTWC official forecasts at 24 hours, 31 have offi- 
cial forecasts at 48 hours and only 17 at 72 hours. Admit- 
tedly, the sample size of the independent storms is quite 


small, but inferences on aptness of the model may still be 


es 








drawn. Both the complete set of results for the independent 
| storms, and the homogeneous set where both JTWC and the 
regression model errors are available will be shown. 

The overall performance (Table 5-7) of the regression 
equations on the entire set of 50 independent cases is first 
examined to determine if there 1S consistency in the fore- 
casts (indicated by small standard deviations) and to deter- 


mine in general how well the equations forecast the motion. 
TABLE 5-7 


Mean and standard deviation forecast vector error 
(nautical miles) of 24, 48 and 72 hours for the 
set of 50 independent storms. 


HOUR FORECAST 


Za 48 n2 
Sample size >10 43 6 
500mb forecast error 
mean 88.4 176.4 279 4 
standard deviation Be Sigler Hoey aa 
700mb forecast error 
mean ELOGs. 139". 3 closer 
standard deviation oars Zo. 5 178.7 
850mb forecast error 
mean 114.9 205.4 358.0 
standard deviation 105.8 146.1 Fi i aa 


The 500mb equations outperformed the other two equation sets 
by a wide margin, which is surprising. Similar differences 


between levels did not appear in the errors of the dependent 
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sample, given in Table 5-8. A possible explanation is that 
there is a greater variation in the synoptic forcing fields 
at 500mb. This allows the 500mb equations to be less suscep- 
tible to large forecast errors in cases where the predictors 
have extreme values. It turns out that with few exceptions, 
the 700mb errors are Similar to the 500mb errors. Where the 
700mb equations performed poorly, the results were much 
worse than the 500mb equations. Therefore, it appears that 
(at least over the independent cases) the 500mb equations 
have a smaller likelihood to give a large forecast error. 
This hypothesis needs to be tested more thoroughly as addi- 


tional data becomes available. 


TABLE 5-8 


Mean and standard deviation forecast vector error 
(nautical miles) of 24, 48 and 72 hours for the 
set of 454 dependent storms. 

FORECAST INTERVAL 


24 48 


Sample size 


500mb forecast error 
Inean 
Standard deviation 


700mb forecast error 
mean 
standard deviation 


850mb forecast error 
mean 
Standard deviation 





OS 








The next step in examination of the independent data 
results is to compare the results of EOF regression forecasts 
to the official JTWC forecasts, for those cases that this 1s 
possible. The mean and standard deviation errors for these 
valid cases, and the benchmark JTWC official forecast error 
Statistics are shown in Table 5-9. A superior 500mb scheme 
is again evident. More importantly, it is seen the standard 
deviation of error for the EOF regression scheme is less 
than for the JTWC official forecasts, which indicates the 
EOF regression scheme is less likely to have a large forecast 
error. The combination of small mean error and small standard 
deviation indicates the EOF scheme outperforms the JTWC 
official forecast. The 700 and 850mb equation forecasts were 
again poorer than the 500mb forecasts, and appear to be about 
equal to the JTWC forecasts. 

Finally, the EOF regression scheme iS compared to the 
JIWe ©fficieal forecast on a caSe-by-casSe basis in Figs. 5-1 
through 5-9. Any points lying above the straight line on 
the graphs represent cases in which the EOF scheme out- 
performed the JTWC official forecasts. The 850mb results 
(Figs. 5-3, 5-6 and 5-9) show little differences between the 
schemes. The 700mb equations (Figs. 5-2, 5-5 and 5-8) show, 
in general, a better forecast by the EOF scheme, as a bulk 
of the points lie above the no difference line. The overall 
comparison statistics appear to have been affected by a few 


large forecast errors, especially at 24 hours. This tendency 
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24 HOUR SOOMB ERROR 


Fig. 5-1. Comparison of the forecast error for the inde- 
pendent data cases. Schemes compared are the 
500mb EOF regression scheme versus the JTWC 
official forecast, for a 24 hour forecast. 
Units are in nautical miles. 
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24 HOUR 700MB ERROR 


Fig. 5-2. Similar to Fig. 5-1, except the 700mb EOF 
regression forecast is compared to JTWC official 
forecast for a 24-hour forecast. 
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me. O-3. Similar to Fig. 5-1, except the 850mb EOF 
regression forecast is compared to JTWC official 
forecast for a 24-hour forecast. 
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" 48 HOUR SOO¥8 ERRCAR 


Pag. S=-4. Similar to Fig. 5-l, except the 500mb EOF 
regression forecast is compared to JTWC official 
forecast for a 48-hour forecast. 
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“48 HOUR 709MB ERROR 


Similar to Fig. 5-1, except the 700mb EOF 
regression forecast is compared to JTWC official 
forecast for a 48-hour forecast. 
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48 HOUR 850MB ERROR 


Similar to Fig. 5-1, except the 850mb EOF 
regression forecast 1S compared to JTWC official 
forecast for a 48-hour forecast. 
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72 HOUR SOCMB ERROR 
meg. S-7. Similar to Fig. 5-l, except the 500mb EOF 


regression forecast 1S compared to JTWC official 
forecast for a 72-hour forecast. 
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72 HOUR 700MB ERROR 


Begs o-6. scaimilar to Fig. 5-1, except the 700mb EOF 
regression forecast is compared to JTWC official 
fomecast for a 72-hour forecast. 
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72 HOUR 850MB ERROR 


Similar to Fig. 5-l, except the 850 mb EOF 
regression forecast is compared to JTWC 
official forecast for a 72-hour forecast. 
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toward large errors does not appear as dramatically in the 
500mb forecasts (Figs. 5-1, 5-4 and 5-7). The superiority 
of the EOF forecasts to the JTWC official forecasts needs 
to be examined over a larger set of independent data. 

One final point of interest on these figures is that 
both the 48-hour 850mb and 72-hour 700mb forecasts have an 
unusually shaped clustering of EOF regression errors at 


about the 150 n mi error level. No physical explanation 


for this clustering is known. It is very likely the event 
is an artifact of the data. It is, nevertheless, interesting, 
and worth closer examination as more data become available. 

A final graphical representation of the differences in 
forecasting methods is shown in Figs. 5-10 through 5-12. 
These graphs are divided by atmospheric level, and on each 
are the JTWC error over the independent sample, the EOF 
regression forecast over the complete and homogeneous inde- 
pendent sample as well as the EOF forecast over the dependent 
sample plotted as a function of forecast time. Once again, 
the EOF regression scheme forecast appears superior over both 


the short and long term for the 500mb equations. 
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Fig. S-ll. Similar to Fig. 5-10, except EOF regression 
results obtained from 700mb equations. 
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results obtained from 850mb equations. 
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Vie PenuNtLeaierOR USE WITH INDEPENDENT DATA 


Based on the results of the previous section, it appears 
that EOF regression forecasting has potential for improving 
forecasts of tropical storm movement. Using a limited inde- 
pendent data set, the method has been shown to be an improve- 
ment on the JTWC official forecasts. There are still 
unanswered questions concerning use of the model operationally 
on independent storms. The regression equations were derived 
uSing orthogonal coefficients derived from one set of eigen- 
vectors. The regression equations derived are strictly valid 
Only for tropical cyclone cases in which the coefficients 
are obtained from these identical vectors, so that the coef- 
ficients have a consistent meaning for each storm. If a new 
case is added to the dependent set, the set of vectors no 
longer exactly explains the maximum variation in all of the 
observations. Therefore, the stability of the eigenvectors 
and coefficients must be examined by determining whether the 
vectors and coefficients remain nearly the same if additional 
cases are added. This stability will be examined theoretical- 
ly, and by a simplified experiment. 

The set of dependent eigenvectors is defined as those 
vectors obtained from the original data set. Independent 
vectors are obtained from the combined set of original 


dependent cases plus the new independent case. If the 
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eigenvectors for the dependent data set are very close to 
the eigenvectors for the independent set, then little error 
will be introduced by using the dependent eigenvectors to 
compute the coefficients for the independent case. In this 
case, the independent case coefficients may be used directly 
in the regression equations as initially derived. If the 
eigenvectors are not consistent, the regression equations 
must be re-derived for every new forecast, including the 
recomputation of a new set of eigenvectors and coefficients 
uSing all data cases. Because of the large amount of compu- 
tation in this case, it is highly desirable that the coeffi- 
cients and vectors are consistent for independent data. 

As in Chapter III, the eigenvectors are derived from 
solving the eigenvector equation using the known matrix R, 
Migre Rk 1s the correlation matrix of the normalized grid 


points: 
R= ABN’ . (1) 


Perec oducise Matrix Of order equal to the number of dimen- 
Sions (grid points), M. The set of eigenvectors constructed 
over the dependent sample should theoretically be stable 1f 
N (number of individual cases) is large. That 1s, addition 
of a single independent case should have very little effect 
on the shape of the observation surface in space. Inclusion 


of an additional data case changes R by: 


IGS. 





_ N l | 
Rvew = wet Boro + nat 2 2' * (2) 


where RuEW 1s the new (independent) correlation matrix after 
addition of the new obServation case, Rortp jc) el ovel ene ale fl) 9¥sial 
(dependent) correlation matrix, N (N+l) the number of cases 
prior to (after) inclusion of the new case, and a is the 

(M X 1) vector of normalized D-values for the independent 
case. If N is initially very large, the term ty el fly abel 
(2) is negligible compared to the first term, since the 


normalized observation elements are rarely greater than two 


or three. Therefore, to a very close approximation, 


Bvew ~ Bono ' (3) 


and the eigenvalues and vectors obtained from the dependent 
data should be almost identical to those obtained over all 
cases. 

The above theory was tested with 500mb data using 
dependent samples of N = 50, 100, 150, 200, 300, and 400 
cases with 33 independent cases. The 33 independent case 
orthogonal coefficients were computed in two ways: 

(1) As a control, the independent case was added to the 
dependent sample, R computed, and the true eigenvectors and 
orthogonal coefficients recalculated. Therefore, 33 separate 
sets of elgenvectors were computed. The eigenvectors and 
orthogonal coefficients are the values that minimize the 
deviation from the mean state for all of the data. 


Wits: 








(2) The test method involved computing the eigenvectors 
only once from the dependent set (N cases). These vectors 
were then used to compute the orthogonal coefficients for 
the independent cases. If regression equations are not to 
be re-derived for every new operational forecast, the coeffi- 
cients in the test method should be nearly identical to 
those from the control. 

Method (2) requires conSiderably less computer time; 
however the question is whether the coefficients are suffi- 
ciently accurate. Only the first ten coefficients are 
examined since they represent the primary contribution to 
the 500mb height fields. The comparison for the first four 
coefficients are shown in Figs. 6-1 through 6-4. The 
quantity 


Vrae >) SBSOLUTE VALUE (Cof. - Cof. ) (3) 
IE aa 25 


is summed over the 33 independent cases. aa is the ith 

coefficient (1 to 10) computed uSing method (1) and sae ON 

is the ith coefficient computed using method (2). The first 

two moments of Ys are examined to determine the stability of 

the coefficients. As N increases, the standard deviations 

of the differences in the coefficients should become smaller. 
The expected "funnel-Shape" with increasing N is seen 


GleerEiy In the first orthogonal coefficient (Fig. 6-1), 


while coefficients 2 and 3 (Figs. 6-2 and 6-3) tend to have 
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Comparison of coefficient 1l derived over 
dependent and independent samples. See text 
for details. On the figures, the middle line 
is the mean and the outer two lines the 95% 
confidence intervals (plus/minus two standard 
deviations). The x-axis is the number of cases 
used. 
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the expected shape only for N greater than 100. For the 

N = 50 case the mean error for both coefficients 2 and 3 

is very large compared to the coefficient size (normally 
less than ten). This indicates the first three coefficients 
may be derived from the dependent set of eigenvectors deter- 
Mined from as few as 100 cases. An unexpected result is 
found with the fourth coefficient (Fig. 6-4), when N = 400 
(also at N = 100). The large standard deviation indicates 
that at least some of the independent cases have very large 
error in this coefficient. A similar indication of unstable 
coefficients also occurs in the sixth, seventh and eighth 
coefficients. 

The source of the error in the calculation of the coeffi- 
cients was found to be due to the structure of the charac- 
teristic equation. Any single vector that is a solution 
eigenvector additionally represents infinite other vectors 
that are also solutions, and which differ only by a constant 
scaling factor (positive or negative). In EOF analysis, the 
coefficients depend upon the numerical values (and signs) of 
the eigenvectors. If one or two of the vectors change signs 
during numerical solution of the eigenvectors, then the 
coefficients must also reverse, which changes the EOF 
reconstruction. It is important to notice that the sign 
reversal actually occurs in deriving the new eigenvectors 
when the new independent case is added. In certain cases, 


the sign of the coefficient changes, although the magnitude 
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of the coefficient remains almost the same. In the cases 

in which some of the eigenvectors reversed signs, the error 
between coefficients is large. Even for these cases, the 
difference in the absolute values of the coefficients 

remains small. This 1s demonstrated in Fig. 6-5, in which 
the coefficient 4 differences are based only on the magnitude 
of the coefficients from the control and test methods. Large 
errors in the other coefficients are similarly reduced when 
the error differences are between absolute values of the 
coefficients. Once the eigenvectors and coefficients are 
derived from the dependent set, and the associated regression 
equations are generated, this set of eigenvectors must be 
used with any independent cases. Even though the dependent 
set may be quite large, the addition of a single new case 
will introduce the possibility of a sign change in one of 

the eigenvectors, and a reversal in sign of the coefficients. 
This would invalidate the original regression equation set, 
and require a re-derivation of both the eigenvectors and 

the regression equations with each new entry into the 

sample. 

The reversal in sign of the coefficients and vectors is 
Peebably due to computer round-off error. Solution of a 120 
dimension eigenvalue problem requires simultaneous solution 
of 120 homogeneous equations--which is an extremely ill- 
conditioned problem (Gerald, 1977). The probability of 
catastrophic round-off error increases dramatically as the 
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number of dimensions increase. However, this reversal 

problem is not significant in the study, as long as the 
coefficients for independent cases are calculated from 

dependent eigenvectors. 

Further attempts to isolate the conditions under which 
this reversal occurs were without success. Random tests 
were conducted in 3, 5, 9 and 20 dimensions. Not until 
dimension size reached 20 were the first reversals noticed. 
The fact that the reversal does not occur until higher 
dimension systems are used is consistent with the argument 
above, because the greater the number of dimensions, the 
greater the probability for catastrophic round-off error. 

Because the coefficients calculated by the two methods 
have consistent magnitudes, it may be concluded that the 
coefficients computed for independent cases using the same 
dependent eigenvectors will introduce very little error to 
the movement forecast. Thus, implementation of these EOF 
regression forecasts with independent cases becomes straight- 
forward. Only two major operations are required. First, 
the EOF orthogonal coefficients from the dependent set of 
eigenvectors are stored. This involves multiplication of 
a (10 X 120) transpose matrix of truncated elgenvectors and 
the (120 X 1) normalized observation vector, which gives 
the ten coefficients. The second step involves simple 
substitution of the independent coefficients into the 


regression equations. The same eigenvectors and eigenvalues 
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may be used indefinitely on independent storms, although it 
1s recommended the regression equations be updated at the 


conclusion of each typhoon season. 
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VIIf. CONCLUSIONS AND FUTURE APPLICATIONS 


It has been shown that EOF coefficients correlate 
strongly with the observed motion. Therefore, use of EOF 
coefficients to represent the geopotential patterns in the 
environment of a tropical cyclone appears to be a valid 
approach for incorporation of synoptic information into a 
statistically based forecast. Incorporation of synoptic 
forcing by using EOF coefficients appears to have potential 
in forecasting tropical storm motion. Using an independent 
sample, an average of 17% improvement relative to JTWC 
official motion forecasts was obtained using the 500mb EOF 
regression equations. The use of 500mb equations gave 
better forecasts than elither’'the 700mb or 850mb equations. 
In contrast, Brown (1981) found no significant difference 
in forecast ability in a map-typing forecast technique using 
the same three atmospheric levels. Since this is only a 
pilot study, the good results shown here need to be tested 
further with new data cases. Several conclusions and future 
applications are drawn from this study. 

(1) The regression equations were developed with a fairly 
small dependent data sample, and yet gave good results when 
tested with an independent sample. As the number of useable 
storm cases for the dependent sample increases, the regres- 


sion equations should become progressively more refined. As 
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the dependent data size increases, in any regression scheme, 
more extreme cases are typically forecast better. Large 
forecast errors should occur less frequently with a larger 
data sample. 

(2) This method of incorporating synoptic fields into the 
regression equations is not limited to observed fields. It 
is likely that coefficients derived from a 24-hour forecast 
field (from dynamic numerical weather prediction models) 
would improve the long range forecast. As seen in the study, 
the accuracy of the regression equations decreased sharply 
in time. This study used only the current observed field. 
After 24 to 36 hours, it is expected that the forcing from 
the mid-latitudes would be significantly different. Use 
of a 24 hour prognosis field might give a better representa- 
tion of the forcing in the long-range forecast. 

(3) The model is extremely simple. Using only values 
representing the synoptic forcing in a limited grid region 
about the storm, past storm movement and an intensity 
measure (which proved to be of little value), the forecasts 
appear to be very good. If variables representing other 
physical features thought to impact storm movement are 
incorporated into the regression equations, even better 
forecasts should be possible. It is possible that the phase 
of equatorial planetary waves near the storm, and other 
large scale circulation features may play a role in tropical 


storm movement. These waves are not eaSily detected. 
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Holton (1972) notes that these waves are usually only 
identifiable in the stratosphere, although they extend 
throughout the troposphere and stratosphere. It is possible 
that these waves could be identified using an EOF analysis 
of the global band in the tropics at a mid-tropospheric 
level. For instance, a global tropical grid, with coverage 
to about 30°N and 30°S may be adequate to identify these 
waves (which would probably be seen in the first 5 to 10 
eigenvectors). These EOF coefficients could then be 
incorporated into the regression equation. A global grid 
could also possibly detect features such as the Walker 
circulation, and these features could be incorporated into 
the regression forecast. A better storm intensity than the 
maximum wind used in this study needs to be found. Variables 
such as the radius of maximum winds should be tested as the 
data become available. The potential predictors that could 
be included are certainly not limited to those mentioned 
above. 

(4) The model was developed for use in the western North 
Pacific Ocean genesis basin, although the method could be 
developed for other genesis regions. The only difference 
in the different regions would be in the values of the 
regression coefficients. 

(5) Rotation of eigenvectors could also be tried to 


improve the model. If this were to be done, the number of 
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retained vectors would have to be larger, to prevent against 
underfactoring. 

(5) Application of the EOF scheme in its present form 
would be a simple matter. In fact, if the regression 
equations were updated only once a year, the entire forecast 
could conceivably be obtained on a hand-held programmable 
calculator with sufficient memory to store the mean and 
standard deviation of the grid points and all eigenvectors. 
Entry of the data at the 120 grid points is all that would 
be required to generate the movement forecast. The grid 
point data might be obtained using a Bessel linear inter- 
polation from the 63 X 63 FNOC analysis. Therefore, the 
scheme could be implemented for operational use with a 
minimum effort. 

In conclusion, the EOF regression scheme shows great 
promise for improvement of operational forecasts of tropical 
storm movement. In this pilot study, uSing a very simple 
model, the scheme performed very well. Potential improvement 
1s possible through addition of more sophisticated physical 
forcing parameters and forecast dynamic fields that may 
affect storm movement. Further research in this area is 


definitely warranted. 
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APPENDIX A 


700 AND 850MB EIGENVECTORS 


The first 10 eigenvectors for the 700 and 850mb level 


follow. These are the vectors used in deriving the coeffi- 


cients used in the regression equations. 


La, 


(multiplied by 100) at 


700mb with the tropical cyclone located at the 


Elgenvector 1 elements 
xX-position. 
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Fig. Al=-2. 


Similar to Fig. Al-l except for eigenvector 2. 
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Similar to Fig. Al-l except for eigenvector 3. 
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Similar to Fig. Al-l except for eigenvector 5. 
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Similar to Fig. Al-l except for eigenvector 6. 


Fig. Al-6. 
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Similar to Fig. Al-l except for eigenvector 7. 
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Similar to Fig. Al-1l except for eigenvector 9. 
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Similar to Fig. Al-1l except for eigenvector 2. 
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Similar to Fig. Al-ll except for eigenvector 3. 
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APPENDIX C 
MODIFIED REGRESSION EQUATION RESULTS 


The enclosed table gives the ee statistic, and the sample size 
tor each atmospheric level, for the modified regression equations. 
These eguations were derived using only 13 potential predictors, 
the 10 coeffiecients, Plat1l, Plontl and Anwt. The vaiues may be 
compared with Table 5-3 using the entire set of 26 predictors. 
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Pap Ghecese "| 


Sample size and R* statistic for each zonal and meridional 
modified regressicn equation by forecast time and atmospheric 





level. 

FORECAST INTERVAL (HR) 

12 24 36 48 60 72 84 
NUMBER OF 
DEPENDENT. 409 409 387 307 281 203 184 
ZONAL EQUATIONS 

500mb Se ae .672 .594 .549 .519 .457 
700mb -758 .695 649 .574 .544 £541 Bae 
850mb wees 66676 .614 .536 .497 4.503 .456 

MERIDIONAL EQUATIONS 
500mb 83 yed ote 229 6252 §. 169 
700mb Gs N35 mses somo 2220 .202  .145 
850mb 431 .396 (357) 3285 OS) ee ee ie 
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